Optimized conditions for Listeria, Salmonella and Escherichia whole genome sequencing using the Illumina iSeq100 platform with point-and-click bioinformatic analysis

Whole-genome sequencing (WGS) data have become an integral component of public health investigations and clinical diagnostics. Still, many veterinary diagnostic laboratories cannot afford to implement next generation sequencing (NGS) due to its high cost and the lack of bioinformatic knowledge of the personnel to analyze NGS data. Trying to overcome these problems, and make NGS accessible to every diagnostic laboratory, thirteen veterinary diagnostic laboratories across the United States (US) initiated the assessment of Illumina iSeq100 sequencing platform for whole genome sequencing of important zoonotic foodborne pathogens Escherichia coli, Listeria monocytogenes, and Salmonella enterica. The work presented in this manuscript is a continuation of this multi-laboratory effort. Here, seven AAVLD accredited diagnostic laboratories explored a further reduction in sequencing costs and the usage of user-friendly platforms for genomic data analysis. Our investigation showed that the same genomic library quality could be achieved by using a quarter of the recommended reagent volume and, therefore a fraction of the actual price, and confirmed that Illumina iSeq100 is the most affordable sequencing technology for laboratories with low WGS demand. Furthermore, we prepared step-by-step protocols for genomic data analysis in three popular user-friendly software (BaseSpace, Geneious, and GalaxyTrakr), and we compared the outcomes in terms of genome assembly quality, and species and antimicrobial resistance gene (AMR) identification. No significant differences were found in assembly quality, and the three analysis methods could identify the target bacteria species. However, antimicrobial resistance genes were only identified using BaseSpace and GalaxyTrakr; and GalaxyTrakr was the best tool for this task.

quality, and the three analysis methods could identify the target bacteria species. However, antimicrobial resistance genes were only identified using BaseSpace and GalaxyTrakr; and GalaxyTrakr was the best tool for this task.

Background
Whole-genome sequencing (WGS) has revolutionized the study and diagnosis of infectious diseases [1]. Currently, an increasing number of food, public, and animal health testing laboratories apply next-generation sequencing (NGS) to obtain the complete genetic information of microbial isolates of interest [2]. Similarly, WGS has been used for disease surveillance allowing a more rapid and accurate outbreak detection and source attribution like in the case of the SARS-CoV-2 pandemic [3][4][5], and antimicrobial resistance (AMR) monitoring is important in pathogens in human and veterinary medicine [6][7][8]. However, NGS technology is still inaccessible for many small diagnostic laboratories, especially in the field of veterinary medicine, due to the high costs associated with acquiring and implementing this technology and the lack of personnel with bioinformatics experience to analyze NGS data.
There are currently two main NGS approaches utilized for the sequencing of microbial whole genomes: (i) short-read sequencing which delivers the genomic content of a particular organism in short reads or DNA fragments of 75-400 bp length; and (ii) long-read sequencing that provides the genome information in longer reads, generally above 10,000 bps [9]. Illumina is the short-read technology most widely used, while PacBio Single Molecule Real Time (SMRT) and Oxford Nanopore are the most popular long-read sequencing alternatives. In 2018, the US FDA through the Veterinary Laboratory Investigation and Response Network (Vet-LIRN) piloted the implementation of the Illumina iSeq 100 sequencing platform in veterinary diagnostic laboratories with lower throughput needs [10]. By 2019, 13 out of the 46Vet-LIRN laboratories standardized a common library preparation protocol using Illumina iSeq100 technologies [10] for the whole-genome sequencing of three important animal and human pathogens (Escherichia coli [E. coli], Listeria monocytogenes [L. monocytogenes] and Salmonella enterica [S. enterica]). Furthermore, this multi-laboratory effort translated into a detailed, step-by-step protocol publicly available via protocols.io (https://dx.doi.org/10.17504/ protocols.io.6qpvr44rbgmk/v1); the optimization of the maximum number of isolates that can be included per sequencing run to obtain an acceptable genome coverage; and the evidence that the iSeq100 chemistry and the Illumina DNA Prep library preparation kit is sufficient to produce quality data for antimicrobial resistance surveillance from bacterial isolates [10].
The work contained in this manuscript goes one step beyond in bringing NGS implementation closer to veterinary diagnostic laboratories. We further tested the impact of reducing reagent volume in genomic library quality and subsequent results to further minimize sequencing costs. Additionally, this project details and compares four different pipelines using both the command-line interface (CLI) and user-friendly resources to analyze Illumina iSeq 100 WGS data for bacterial species and AMR identification to overcome the potential lack of bioinformatics experience of involved laboratory personnel.

Bacterial isolates
Two L. monocytogenes, two S. enterica ser. Typhimurium and one E. coli isolate were used in this study (Table 1), and include the same isolates used in the original Vet-LIRN collaborative project [10]. Frozen isolates were sub-cultured twice on Trypticase Soy + 5% Sheep Blood Agar plates (BAPs) or equivalent media prior to DNA extraction.

Laboratory procedures
Bacterial DNA extraction and library prep were carried out following the original protocol (https://dx.doi.org/10.17504/protocols.io.bij8kcrw), using original reagent volumes (X 1) and reducing the reagent volumes to half (X 0.5) or to a quarter (X 0.25) as specified in Table 1. A step-by-step protocol including the optimized reagent volume reduced to a quarter can be found on https://dx.doi.org/10.17504/protocols.io.6qpvr44rbgmk/v1 and it is included for printing as S4 File with this article. Briefly, bacterial genomic DNA was extracted using either the DNeasy Blood & Tissue Kit (Qiagen) or the MagMAX CORE automated extraction kit (Thermo Fisher). The concentration of purified DNA was measured by Qubit fluorometry (Thermo Fisher). Barcoded sequencing libraries were prepared using the DNA Prep kit (Illumina). Sequencing was performed using iSeq100 2 × 150 bp chemistry (Illumina). First, the three bacteria species were run in separate runs: two L. monocytogenes under three library prep conditions (6 samples) were sequenced in run 1; one E. coli under three library prep conditions (3 samples) was sequenced in run 2, and the two S. enterica under two library prep conditions (4 samples) were sequenced in run 3. A 4 th mixed run with one L. monocytogenes, one E. coli, and two S. enterica using a quarter of the recommended reagents was performed (Table 1).

Statistical analysis
Linear Mixed-Effects Models in statistical software GraphPad Prism 6.0 (La Jolla, USA) were used to determine significant differences in the library prep process, read recovery, and genome coverage using different reagent volumes, as well as to assess differences in assembly quality metrics between the different platforms.

Public data submission
Sequencing data generated as part of this project is deposited in GenBank under PRJNA834767.

Minimizing reagent usage for library preparation
With the main goal of making NGS-based diagnosis of infectious diseases accessible to most veterinary diagnostic laboratories, we first investigated the costs associated with sequencing reagents, instrument acquisition, and maintenance for three Illumina sequencing platforms: iSeq100, MiSeq, and NextSeq1000 (Table 2). Our calculations showed that NextSeq1000 had the highest machine acquisition and maintenance cost, while iSeq100 had the lowest. The reagent cost per sample was found to be the same independently of the platform or cartridge used because the Nextera XT DNA Library Preparation Kit is compatible with all Illumina sequencers (Table 2A). These high differences in sequencing associated costs (Table 2B) reside in the fact that Miseq (using the v2 300 cycles cartridge) and Nextseq1000 can sequence substantially more isolates in the same cartridge than iSeq100 or the smaller MiSeq cartridges and still reach a genome coverage of 50X (Table 2A). However, the iSeq100 platform seemed to be the most suitable option for small veterinary diagnostic laboratories because this machine and its maintenance are five times cheaper than MiSeq and ten times cheaper than Next-Seq1000, and run times are comparable. The major limitation of iSeq100; however, is the fact that it can sequence only up to six bacterial genomes with a 50X coverage while a run of MiSeq could sequence up to 36 Listeria isolates at once (Table 2A). Trying to reduce sequencing costs even more, we investigated the impact of using reduced volumes of genomic library prep reagents on sequencing performance and read recovery using Illumina sequencing technology (Table 2B). We decided to compare the outcomes in terms of the number of clusters generated in the sequencing machine, the number of reads obtained in each run, and genome coverage using the manufacturer's recommended reagent volume (X1), versus half volume (X0.5) and a quarter volume (X0.25) for the sequencing of foodborne pathogens E. coli (n = 1), L. monocytogenes (n = 2), and S. enterica ser. Typhimurium (n = 2). Initial bacterial DNA used for library prep was adjusted to each reagent condition, and 300 ng, 150 Table 2. An overview of the cost of using iSeq100, MiSeq, and NextSeq1000 as a diagnostic tool including, expenses associated with instrument acquisition and maintenance (A), and cost-cutting associated with reagent reduction during library prep for iSeq100 (B ng, and 75 ng of DNA were used with X1, X0.5, and X0.25 reagent volumes, respectively. We observed that a reduction in the reagents and initial DNA used to prepare the genomic libraries prior to sequencing translated into a decreased genomic library concentration. The utilization of the manufacturer's recommended volumes yielded library concentrations of~80 nM depending on the bacteria species. In contrast, half (~35 nM) and a quarter (~18 nM) of these concentrations were obtained when using half and a quarter of the recommended values respectively (Table 3). Nevertheless, all library concentrations showed values higher than the minimum 1 nM recommended by the manufacturer to be used for subsequent steps of the sequencing process. Additionally, there were no significant differences in the number of sequencing clusters, the number of reads produced, or genome coverage between the different reagent volumes (Table 3), evidencing that as little as a quarter of the manufacturer's recommended volume yields comparable sequencing performance. We calculated the cost-cutting associated with reagent reduction during library prep for iSeq100 and MiSeq systems (Table 2B). Both Illumina instruments require the same library prep kit. Even when reagent reduction was applied to MiSeq cost, iSeq100 appeared to be the most affordable option when a small number of samples was run (Table 2). However, increasing the number of samples per run in the MiSeq decreases, in turn, the sequencing cost per sample to $70 and $54 when sequencing 24 or 36 samples, respectively, using a quarter of the recommended reagent volume.

Four alternative pipelines for WGS data analysis from raw reads to contigs
Another bottleneck small veterinary diagnostic laboratories may face during the introduction of WGS as a diagnostic tool is the lack of bioinformatics experience to analyze sequencing data. We tested the performance of three user-friendly platforms to analyze NGS data: Galaxy-Trakr [16], BaseSpace (Illumina), and Geneious (Geneious Prime, https://www.geneious. com). Furthermore, we have compared the capabilities of these three online alternatives under default settings (see Materials and Methods) on read filtering and assembly to what was obtained by using an in-house bioinformatics pipeline in the command-line interface (CLI). Fig 1 summarizes the main steps followed for raw read processing. Individualized step-by-step protocols for each platform are available in S1-S3 Files. First, we looked at the programs available in each platform for raw sequence data processing ( Table 4). The same programs were used in the different platforms, when possible, to have more comparable results (Fig 1). Except for Geneious, all platforms had a program such as FastQC [11] to visualize the quality of the raw reads prior to starting data processing. Furthermore, both BaseSpace and Geneious are missing a program to determine the quality of the read assembly. However, for both platforms Table 3. Impact of reducing initial DNA and reagent usage during library prep on read recovery and genome coverage.

PLOS ONE
Optimized conditions for sequencing bacteria using Illumina iSeq100 platform and easy analysis de novo assembler SPAdes [13] provides an indicative table with the most common assembly quality scores (example in S1 and S2 Files). No significant differences in assembly quality were found when comparing the libraries prepared with different reagent volumes (Fig 2). Similarly, there were no significant differences between platforms regarding the number of contigs obtained after assembly, the assembly length, or the assembly quality score N50 (Fig 2). These results further support that a quarter of the recommended reagent volumes is enough to prepare genomic libraries and evidence that any of the three platforms can be used indistinctly for raw read data processing.

Identifying species and AMR genes
We reviewed the capacity of the three user-friendly platforms to pursue genome analysis that will help in the diagnostic process, such as species recognition, serotype and sequence type determination, and antimicrobial resistance gene (ARG) identification. Program options for each platform were identified (Table 4). Geneious did not have any already installed or plugin program among its tools to carry out the tasks mentioned above, but we used the popular alignment tool BLASTn for species identification. BaseSpace is equipped with a "Bacterial Analysis Pipeline" that claims to be able to predict the species of bacterial input genomes (using KmerFinder [15,17]), identifying ARGs (with ResFinder [18,19]), and, depending on the identified species (only for Enterobacteriaceae), performing a Multilocus Sequence Typing (MLST) classification, as well as a plasmid and virulence factor recognition. GalaxyTrakr appeared to be the best-equipped platform with a large selection of programs, including Sendsketch [20] for bacteria species determination, AMRFinder [21] for identifying AMR genes, and SISTR for serotyping. All species tested (E. coli, L. monocytogenes, and S. enterica) were identified regardless of the reagent volume used during library prep or the bioinformatics platform that analyzed the data. Then, we looked at differences in the antimicrobial resistance gene profiles of the tested bacteria identified by the different bioinformatic approaches, and we discovered that BaseSpace failed to identify one of the E. coli ARGs recognized by GalaxyTrakr and the in-house CLI pipeline. All the ARGs were identified in L. monocytogenes (Table 5). Furthermore, Galaxy-Trakr was the only platform that predicted bacteria serotypes.

Discussion
The work included in this manuscript is the continuation of a previous effort to make NGS available for all veterinary diagnostic laboratories [10]. Our first goal in this project was to determine the NGS short-read technology that would better fit the needs of a small/medium veterinary diagnostic laboratory like the Athens Veterinary Diagnostic Laboratory (ADVL) from the University of Georgia. ADVL receives around three to five cases a week that would require further species identification using non-sequencing methods, such as biochemical tests or MALDI-TOF. Hence, if WGS were used as a diagnostic tool instead, the perfect sequencing platform would allow the sequencing of up to five bacterial genomes per run. We compared the cost of acquiring, maintaining, and running the three most popular Illumina platforms iSeq100, MiSeq, and NextSeq1000. Although MiSeq, and NextSeq1000 sequencing platforms offer the potential to sequence up to 25 (MiSeq) and 200 (NextSeq1000) bacterial genomes in one single run, the upfront investment needed to implement MiSeq or NextSeq1000 sequencing is five to ten times higher than iSeq100. With higher number of samples, MiSeq and Next-Seq1000 represent a more affordable option with an estimated cost per sample of $15-54 (using 96-sample library prep kits and a full cartridge), but the per-sample cost doubles when sequencing only six or fewer isolates. Therefore, we concluded that iSeq100 was the most suitable Illumina sequencing platform for laboratories with a small/medium throughput. We are aware that long-read sequencing platforms such as Oxford Nanopore MinION and Flongle adapter are becoming a very popular alternative to Illumina for bacteria WGS due to the $0 instrument acquisition and maintenance cost (Oxford Nanopore starting package includes a MinION sequencer with the first library prep kit ordered, https://store.nanoporetech.com/us/ devices.html). Due to the different bioinformatics analysis tools used for long read sequencing, we did not compare the outcomes obtained from the Illumina platforms with this technology. Future work will focus on performing such comparisons, especially since recent evidence showed that several bacterial isolates can be multiplexed and sequenced simultaneously in the same MinION run with high genome coverage [22]. Genomic library preparation is the costliest step for the whole-genome sequencing of microbial isolates. That pushed several research groups to try producing genomic libraries using reduced reagent volumes to lower the cost [23,24]. In the attempt to make NGS an affordable process for all veterinary diagnostic labs, we investigated if a reduction of the recommended bacterial DNA input for genomic library preparation with a subsequent decrease of reagent volume would impact sequencing performance and outcome data quality. Illumina recommends 4 nM as the starting genomic library concentration to reach the optimal applicable loading concentration [25]. If the concentration of the genomic library obtained is above 4 nM, the library must be diluted to 1 nM to continue with flow cell loading steps (denaturation and further dilution to 100-200 pM in loading buffer) [25]. Hence, producing genomic libraries with a concentration above 1 nM is a waste of library prep reagents and bacterial DNA. Our results showed that even when a quarter of the DNA and reagent volume recommended by Illumina was used, the genomic libraries obtained were at least three times more concentrated than the 1 nM threshold. Subsequently, we did not see any significant effect on sequencing performance, read recovery, and genome coverage (Table 3).
Besides the elevated costs, the lack of bioinformatics knowledge among lab personnel is another bottleneck that hinders NGS implementation in veterinary diagnostic laboratories [26]. We prepared step-by-step protocols for three well-known user-friendly bioinformatic software platforms-Illumina BaseSpace, GalaxyTrakr, and Geneious-and we tested them with the genomic data obtained in the four sequencing runs performed in this study. No significant differences in genome assembly quality were observed using different bioinformatics platforms (Fig 2). Similarly, all targeted species were identified regardless of the software platform used. Hence, if bacteria species identification is the final goal, any of the platforms used in this study can be used.
Geneious was designed for research purposes to assist researchers working with omics data and lacking solid bioinformatics knowledge. It is equipped with an extensive list of programs to analyze genomic, transcriptomic, and proteomic data (https://www.geneious.com). However, Geneious lacks critical programs for diagnostic purposes, such as software for bacterial species determination and ARG recognition. Geneious runs locally on computers that activate the annual license. This is a good thing when the data analysis does not require a lot of computational resources (like when analyzing small bacterial or viral genomes) because the user does not have to face the typical waiting times that happen when sending the analysis remotely to a supercomputer. However, bigger genomes such as fungal genomes may require more computer power than general local computers, and data analysis can crash or take much longer than when working remotely with a supercomputer. Additionally, annual licensing costs for Geneious are between $840 (2 computers in an academic institution) to close to $15,000 (10 computers in a corporation).
BaseSpace is a website developed by Illumina where registered users can easily store, analyze, and share genetic data (https://basespace.illumina.com). One of the benefits of BaseSpace is that users have access through their Illumina accounts, and the sequencing data from the sequencers, including iSeq100, are streamed in real-time over the Internet to BaseSpace at no additional cost. Once the sequencing data is in the BaseSpace Sequencing Hub, the user can access a limited set of free online BaseSpace apps for genomic and transcriptomic data analysis. However, most of the applications required for genomic raw data processing and bacteria species and ARG identification are not free. A yearly BaseSpace Sequence Hub Professional subscription costs $500 and includes 500 iCredits to go towards data storage or app usage. Additional credits can be purchased for~$1 each. The complete bioinformatic analysis using this platform costs about 7 iCredits or~$7 per sample, of which 5 iCredits are for the Bacterial Analysis Pipeline alone. Additionally, BaseSpace failed to identify some of the ARGs found using other approaches, so it may not be the best option for diagnostic labs that require consistent, accurate results for all microorganisms.
GalaxyTrakr [16] is a free, open-source instance created by the US FDA in the bioinformatic platform Galaxy (http://galaxyproject.org) for use by laboratory scientists in the Geno-meTrakr network, the first distributed network of public health and university laboratories that collect and share genomic and geographic data from foodborne pathogens [16]. Galaxy-Trakr adapts the most popular bioinformatics tools used in the CLI to a user-friendly interface so researchers and clinicians without previous bioinformatic experience can benefit from the same resources experienced bioinformaticians use. Therefore, we consider GalaxyTrakr as the best option for diagnostic labs. Limitations of GalaxyTrakr include that analyses may be subject to delays as they are performed remotely in the Galaxy supercomputer, and users are limited in how much data can be stored at any given time.
Altogether, the continued improvement of NGS technologies and resources for bioinformatics is bringing down the cost of sequencing applications. With optimization of DNA input and reagent volumes, an increased number of veterinary diagnostic laboratories can adopt this important tool for tasks such as identifying antimicrobial resistance signatures with minimal or no bioinformatics training. Step-by-step library prep protocol using optimized reagent volume reduced to a quarter. (PDF)