Single gene targeted nanopore sequencing enables simultaneous identification and antimicrobial resistance detection of sexually transmitted infections

Objectives To develop a simple DNA sequencing test for simultaneous identification and antimicrobial resistance (AMR) detection of multiple sexually transmitted infections (STIs). Methods Real-time PCR (qPCR) was initially performed to identify Neisseria gonorrhoeae (NG), Chlamydia trachomatis (CT), Mycoplasma genitalium (MG) and Trichomonas vaginalis (TV) infections among a total of 200 vulvo-vaginal swab samples from female sex workers in Ecuador. qPCR positive samples plus qPCR negative controls for these STIs were subjected to single gene targeted PCR MinION-nanopore sequencing using the smartphone operated MinIT. Results Among 200 vulvo-vaginal swab samples 43 were qPCR positive for at least one of the STIs. Single gene targeted nanopore sequencing generally yielded higher pathogen specific read counts in qPCR positive samples than qPCR negative controls. Of the 26 CT, NG or MG infections identified by qPCR, 25 were clearly distinguishable from qPCR negative controls by read count. Discrimination of TV qPCR positives from qPCR negative controls was poorer as many had low pathogen loads (qPCR cycle threshold >35) which produced few specific reads. Real-time AMR profiling revealed that 3/3 NG samples identified had gyrA mutations associated with fluoroquinolone resistance, 2/10 of TV had mutations related to metronidazole resistance, while none of the MG samples possessed 23S rRNA gene mutations contributing to macrolide resistance. Conclusions Single gene targeted nanopore sequencing for diagnosing and simultaneously identifying key antimicrobial resistance markers for four common genital STIs shows promise. Further work to optimise accuracy, reduce costs and improve speed may allow sustainable approaches for managing STIs and emerging AMR in resource poor and laboratory limited settings.


Introduction
Sexually transmitted infections (STIs) remain a major public health problem worldwide, with an estimated 357 million new cases of Chlamydia trachomatis (CT), Neisseria gonorrhoeae (NG), Treponema pallidum (TP) and Trichomonas vaginalis (TV) per year [1]. High rates of the STI, Mycoplasma genitalium (MG), have also been reported worldwide [2] and is associated with genital discharge syndrome in men and reproductive sequelae in women [3]. All these STIs are generally curable with existing, effective single-dose antibiotic regimens but, if left undiagnosed and/or untreated, can result in serious long-term reproductive health sequelae, particularly for women. Antimicrobial resistance (AMR) among some STIs to multiple classes of antibiotics has spread rapidly in recent years. For NG, loss of extended spectrum cephalosporins as first line empirical treatment is a major concern [4] as circulating multi-and extensive-drug resistance clones have been detected internationally [5]. For MG, macrolide resistance is now widely but not universally reported with increasing rates of resistance to fluoroquinolones also detected [6]. These developments have made treatment, and particularly empirical treatment challenging.
New World Health Organization (WHO) guidelines reinforce the need to treat these STIs with the right antibiotic, at the right dose, and the right time to reduce spread and improve sexual and reproductive health. Laboratory based nucleic-acid amplification tests (NAAT) for detection of NG, CT, TV and MG are well established using various gene targets, for example, GenoQuick 1 CT (Hain Lifescience, Germany) which specifically and simultaneously detects both the gene encoding outer membrane protein 1 (omp-1) and the cryptic plasmid. For both NG and MG, accurate and rapid diagnostics which also predict antibiotic susceptibility are likely to be needed to achieve this. Laboratory based NAAT for detection of NG, CT, TV and MG are the current gold standard for detection, providing high sensitivity and specificity. NAAT are widely used in high-income countries but are often unavailable in resource-poor settings. Culture-based antimicrobial susceptibility testing (AST) for NG remains gold standard for phenotypically predicting AMR and commonly takes usually two to five days to obtain a result in UK microbiology laboratories (personal communication, Sadiq). This in practical terms may be too late to initiate targeted antibiotic therapy, particularly for hard to reach vulnerable populations. MG is difficult to cultivate and requires cell culture, not usually feasible in most clinical settings.
Advances in understanding AMR in NG, MG and TV have allowed for development of NAAT-based AMR detection. For example, absence of gyrA mutations at amino acid positions S91 and D95 of NG accurately predicts fluoroquinolone susceptibility [7,8], presence of mutations at positions 2058 and 2059 (Escherichia coli numbering) in region V of the 23S rRNA gene in MG is associated with failure of treatment with azithromycin [9][10][11], and a single nucleotide polymorphism (SNP) in ntr6 gene (ntr6 A238T) of TV is associated with metronidazole resistance, which may have diagnostic value for metronidazole resistance [12]. Use of NAAT-based AMR tests, however, may have limitations due to continually changing mutations and novel mechanisms of resistance evolving under ongoing treatment selection pressures. Whole genome sequencing (WGS) using high throughput sequencing platforms enables detection of multiple genes/pathogens in a single run, which may address this challenge to some degree and also give added value in identifying phylogenetic relationships in identified infections [13,14]. However, for diagnostic purposes WGS itself can be constrained by sample preparation, cost etc., making it unsuitable for near patient applications.
Oxford Nanopore Technologies' (ONT) portable MinION DNA sequencer, together with its recent MinIT hand-held processor, may offer advantages as an accurate diagnostic in resource-limited settings. Herein we report early phase study among female sex workers (FSWs) in Ecuador, of using the MinION, controlled by a smartphone-operated MinIT to detect the four common STIs (NG, MG, TV and CT) and profile AMR to ciprofloxacin in NG, azithromycin in MG and metronidazole in TV utilising a single gene targeted approach.

Ethics statement
Written and verbal informed consent was obtained from FSW participants all of whom were 18 years or older and none of whom received any compensation for their participation. The study was conducted according to the Declaration of Helsinki and approved by the Ethical Committee of Universidad Internacional del Ecuador (02-02-17).

Clinical sample collection and DNA preparation
A cross-sectional study of STIs was conducted among FSWs in a primary health centre in Quito, Ecuador, during the last quarter of 2017 [15]. Vulvo-vaginal swab samples were collected by clinicians using Xpert 1 CT/NG Patient-Collected Vaginal Swab Specimen Collection Kit (Cepheid). DNA from swab samples was prepared using PureLink Genomic DNA Mini Kit (Thermo Fisher Scientific) according to the manufacturer's instructions and quantified using a NanoDrop Spectrometer. STI identification of swab samples was performed by singleplex qPCR for individual STI pathogen. The DNA samples were stored at -20˚C till the present study.

STI identification of swab samples by qPCR
Swab samples were initially screened by singleplex qPCR to identify NG, TV, MG and CT infections, with well-studied gene targets and primers, as shown in Table 1. qPCR was performed using Applied Biosystems 7500 Fast Real-Time PCR System in a volume of 10 μl containing 5 μl of TaqMan™ Fast Universal PCR Master Mix, 1 μl of 10x Exogenous Internal Positive Control (IPC) Mix, 0.2 μl of 50x IPC DNA, 250 nM each of the primers, 100 nM each of the probes and 50 ng of template DNA. Cycling parameters were 95˚C for 10 min, followed by 40 cycles of 95˚C for 15 sec, 60˚C for 1 min. Positive STI samples and some negatives identified by qPCR were used for this study.

Targeted PCR (tPCR) amplification and barcoding
Both qPCR positive and negative samples for NG, MG, TV and CT plus an NG positive control were amplified by tPCR, targeting genes gyrA (NG), 23S rRNA (MG), ntr6 (TV) and omp1 (CT) with the primers listed in Table 1. tPCR reactions for each target were performed in a 50 μl volume consisting of 25 μl LongAmp 1 Taq 2X Master Mix (NEB), 200 nM each of tPCR primers and 100 ng DNA template as follows: 95˚C for 3 min followed by 35 cycles of 95˚C for 30 sec, 58˚C for 30 sec, 65˚C for 1 min 30 sec, and one cycle of 65˚C for 5 min. tPCR amplicons were purified and subjected to a second PCR for barcoding using PCR Barcoding Expansion 1-96 (EXP-PBC096, ONT) according to the manufacturer's instructions. In this study, we used a sequence-based method for pathogen detection by mapping a 'unique' gene sequence of pathogens of interest to a reference database, and therefore specificity of detection lies within the sequence itself. This is different from conventional nucleic acid amplification methods, in which primer specificity is crucial to detection accuracy. The AMR-associated Table 1. Gene targets, primers and probes used in this study.

a) Initial real-time PCR (qPCR) for STI testing on samples
Trichomonas vaginalis: T. vaginalis specific repeat DNA fragment (amplicon 92 bp) [16] TV_F: AAAGATGGGTGTTTTAAGCTAGATAAGG gene targets we used have a unique nucleotide sequence specific to pathogen of interest and can differentiate itself from other species. As a general practice we have searched nucleotide databases by BLAST when designing tPCR primers in order to discriminate as much as possible the target from other closely related bacterial species. Furthermore, to enhance the read counts of pathogen of interest we selectively designed the tPCR primers with some mismatch base(s) to other closely related bacterial species at the 3' end for preferential target amplification. . QC and Barcoding component contains quality score (qscore) and barcode filter, cutting off reads with a qscore below the threshold (min qscore:7) which were uploaded for analysis, and demultiplexing barcodes if the sequenced samples were barcoded. WIMP (what's In My Pot) allows for species identification by classifying read sequences against the standard Centrifuge database (including RefSeq complete genomes for bacteria). ARMA (antimicrobial resistance mapping application) performs antimicrobial resistance identification by aligning input reads with minmap2 against all reference sequences available in the CARD database (CARD version 1.1.3). TV, an anaerobic, flagellated protozoan parasite, was not included in RefSeq for bacteria and therefore a TV G3 nitroreductase family protein (TVAG_354010, ntr6) reference was uploaded using Fasta Reference Upload r3.2.2 and TV identification was performed using Fastq Custom Alignment r3.2.2 against the ntr6 reference sequence. Read counts were log-transformed and compared between qPCR positive and negative samples by t-test.

BLAST search
DNA sequences of NG, MG and TV reads were extracted from read FASTQ files and BLASTaligned against the reference sequences: Neisseria gonorrhoeae FA 1090 gyrA reference (NC_002946.2:c621189-618439), Mycoplasma genitalium strain G-37 23S rRNA gene (NR_077054.1) and Trichomonas vaginalis G3 nitroreductase family protein reference (TVAG_354010) respectively. Part of the aligned sequences flanking AMR-associated mutations were used to construct a consensus sequence using MultAlin [20]. AMR-associated mutations were confirmed by manually comparing the consensus sequence against the reference.

STI identification of swab samples by qPCR
A total of 200 vulvo-vaginal swab samples from FSWs were initially screened by qPCR, and 43 were found positive, including 37 single infections (11 MG, 19 TV and 7 CT) and six co-infections (two CT/TV and one each of CT/NG, NG/TV, CT/NG/TV and CT/TV/MG), as shown in Table 2.

STI identification and AMR detection by single gene tPCR nanopore sequencing
All 43 qPCR positives plus 12 negatives (3 for each pathogen) and an NG positive control were subjected to single gene tPCR nanopore sequencing in one barcoded sequencing library. A sequencing run of 10 hours produced 1,499,872 reads of which 1,127,703 (76%) reads passed QC, with total yield of 1.2 gigabases, average qscore 8.34 (S1 Fig) and  Targeted amplicon sequencing from clinical samples, as expected, still non-specifically produced some reads from other microbial and/or host DNA in the sample. Among 810,683 classified reads of the whole sequencing library, 560,601 (69%) mapped to the human genome. Details on the bacterial species that was mapped are shown in S1 Table. The ranges of classified reads mapped to the pathogens of interest are 0.5-95% (median 9.6%) for the CT positive samples, 0-86.6% (median 48.5%) for the MG positive samples and 92.3-95% (median 94.7%) for the NG positive samples, while the range of analysed reads mapped to TV in the TV positive samples is 0-99.4% (median 0.1%). However, for STI pathogens, qPCR positive samples generally yielded higher specific read counts than qPCR negative controls, all of the latter of which had fewer than 15 reads (Fig 1). Mean log 10 read counts were higher in qPCR positive NG, MG and CT samples compared to negative by t-test (S2 Table) but not significantly for TV as 14/24 TV read counts in TV qPCR positive samples were very low (<20 reads). NG, MG and CT read counts clearly separated qPCR positive and negative samples except for one MG qPCR positive sample that had an absolute read count of 1 (Fig 1). The samples having <20 reads generally had a low load of pathogen, as evidenced by qPCR cycle threshold (Ct) being >35 (Table 2; S2 Fig).
The ARMA workflow assigned 86,430/810,683 (11%) classified reads to antibiotic resistance genes in the CARD database, >99.99% of which were identified to be fluoroquinolone resistant gyrA gene in NG, using protein variant model. This analysis indicated that 3/3 NG clinical strains identified were fluoroquinolone-resistant, while none of the MG strains were resistant to macrolide antibiotics. The NG positive control, phenotypically susceptible to fluoroquinolones, and genotypically wild-type by BLAST (see below), was mis-identified to be fluoroquinolone-resistant. TV antimicrobial resistance was not profiled by ARMA due to the absence of TV ntr6 entries in the databases. Instead, it was analyzed by manual BLAST search against TV G3 ntr6 reference sequence (see below).

AMR confirmation by manual BLAST search
A manual BLAST search was performed to ascertain the accuracy of the pathogen AMR identified by the ARMA workflow. The BLAST confirmed, as shown in Table 3, that all (3/3) of NG had fluoroquinolone-resistant mutations, 2/10 (20%) of TV had mutations related to metronidazole resistance, and none of MG had macrolide resistance-associated mutations. It also showed that the gyrA of NG positive control did not possess fluoroquinolone-resistant mutations.

Discussion
We demonstrated the potential to test four common genital STIs and simultaneously detect AMR in three of them among vulnerable FSWs in Quito, Ecuador using the portable MinION DNA sequencer, controlled by smartphone, through a single gene targeted approach.
Single gene approaches for detection of infection, such as 16S rRNA sequencing have been increasingly described [21,22], particularly with the advent of long read sequencing such as PacBio and ONT [23], which has greater likelihood of taxonomically identifying organisms at species level. We intentionally evaluated the accuracy of sequencing of a single AMR gene, as opposed to a two-stage gene approach [24], to both diagnose infection and predict AMR simultaneously in order to evaluate potential for use in resource poor field settings where simplicity and cost become increasingly important factors to consider.
Our approach appeared to have some advantages over NAAT. We demonstrated initially that genes responsible for AMR have potential to be as useful as 16S rRNA genes for diagnosis and thus may serve as a 'two-in-one' target. We chose for both pathogen and AMR detection the NG gyrA, MG 23S rRNA and TV ntr6, all gene mutations in which have been shown to be predictive of AMR [7-12, 25, 26], and CT omp-1 for CT detection [27]. As sequencing allows for identification of evolving mutations in the same genes without changing the test, our approach will therefore have value as it is adapted to other gene targets undergoing continuous evolution under selection pressure such as penA in which changes may give rise to penicillin and extended spectrum cephalosporin resistance [28]. The approach can also allow for sequencing multiple genes simultaneously alongside vigilance over any changing nature of phenotypic to genotypic association for antibiotics where AMR involves multiple mechanisms.
Clinical STI samples contain a wide range of pathogen loads being drawn frequently from largely asymptomatic patients as in the case of the FSW cohort in this study. We found that except for TV, diagnosis of NG, CT and MG by the single gene targeted nanopore sequencing appeared correct as almost NG, CT and MG positive samples had high pathogen loads. Due to the small sample size used in this study we did not determine a cut-off value of read counts for a positive STI test, but even so, our single gene approach correctly identified the majority of STI infections in the samples which had qPCR Ct<35 (higher pathogen loads) by their distinctive read counts. In the group with lower pathogen loads (Ct>35), tPCR amplification appeared not to enhance the absolute read counts of pathogen targets, possibly due to low tPCR efficiency caused by the presence of a large quantity of background DNA and non-optimal tPCR conditions, resulting in the 'false-negatives' observed for TV. Further optimization of tPCR conditions, for example enrichment of pathogen DNA during sample DNA preparation, may maximize the diagnostic sensitivity for this group of samples. Regarding the Fastq Antimicrobial Resistance workflow used in this study, the component WIMP correctly identified each pathogen of interest, but we found the component ARMA CARD appeared to mis-profile the AMR of the NG control strain. This might be partially because the ARMA CARD used individual reads for matching AMR associated mutations, supported with the result obtained for the NG control strain by the manual BLAST search using the consensus sequence. A mixed AMR profile may exist in the community of pathogens of clinical samples, and therefore targeted amplification of the AMR associated genes can lead to a mixture of different AMR alleles. Therefore, any further development of ARMA, or a separate workflow, which contains a method to detect and quantify different AMR alleles, would be more appropriate for accurate prediction of AMR.
This study had some limitations. Firstly, there was a small sample size, which impacted particularly on measures of specificity. Secondly no further confirmatory sequencing was performed to confirm the mutations present, largely due to the local regulations which prevented transporting material out of Ecuador. Thus, this study should be regarded as a proof of concept study. Future work is required on larger sample sets with composite reference standards in a formal diagnostic evaluation. Additionally, we used older generation of reagents LSK-108 and tools (MinIt) in this study, which potentially impacted the output and speed of the workflow. Optimisation of amplification and target multiplexing will also be important.
Nanopore sequencing with MinION requires substantially lower infrastructure and startup cost compared to other high throughput sequencing platforms. Although arguably consumables are not optimally priced for resource poor settings, the promise of combining automated library preparations and use of disposable flow-cells has potential to use nanopore sequencing for field diagnostics. We recently demonstrated that even in developed settings implementation of rapid ciprofloxacin NAAT resistance tests for NG still requires net investment [29]. More recently ONT has developed a tablet version, MinION™ Mk1C which enables full sequencing and analysis to be performed in the lab and field without need for internet connection, and released rapid barcoding kits and flongle for sequencing. This development could reduce cost and improve speed further to use nanopore sequencing as a diagnostic tool. However, this will be formally evaluated in a large-scale clinical study.
In conclusion, this study demonstrated that single gene targeted nanopore sequencing for diagnosing and simultaneously identifying key antimicrobial resistance markers for four common STIs shows promise. Further work to optimise accuracy, reduce costs and improve speed may allow sustainable approaches for managing STIs and emerging AMR in resource poor and laboratory limited settings.