Nested PCR followed by NGS: Validation and application for HPV genotyping of Tunisian cervical samples

The most used methodologies for HPV genotyping in Tunisian studies are based on hybridization that are limited to a restricted number of HPV types and to a lack of specificity and sensitivity for same types. Recently, Next-Generation sequencing (NGS) technology has been efficiently used for HPV genotyping. In this work we designed and validated a sensitive genotyping method based on nested PCR followed by NGS. Eighty-six samples were tested for the validation of an HPV genotyping assay based on Nested-PCR followed by NGS. These include, 43 references plasmids and 43 positive HPV clinical cervical specimens previously evaluated with the conventional genotyping method: Reverse Line Hybridization (RLH). Results of genotyping using NGS were compared to those of RLH. The analytical sensitivity of the NGS assay was 1GE/μl per sample. The NGS allowed the detection of all HPV types presented in references plasmids. On the clinical samples, a total of 19 HPV types were detected versus 14 types using RLH. Besides the identification of more HPV types in multiple infection (6 types for NGS versus 4 for RLH), NGS allowed the identification of HPV types that were not detected by RLH. In addition, the NGS assay detected newly HPV types that were not described in Tunisia so far: HPV81, HPV43, HPV74, and HPV62. The high sensitivity and specificity of NGS for HPV genotyping in addition to the identification of new HPV types may justify the use of such technique to provide with high accuracy the profile of circulating types in epidemiological studies.


Introduction
High risk Human Papillomavirus (HR-HPV) infection is known to be the leading cause of Cervical cancer (CC) [1,2]. CC remains the second most common cause of death from gynecological cancer in women worldwide (Globocan 2020) (https://www.uicc.org/news/globocan-2020-new-global-cancer data). In Tunisia, it is the fourth cause of mortality in women with 185 deaths each year (https://www.uicc.org/new-global-cancer-data-globocan-2020). Two hundred HPV genotypes have been characterized to date and more than 40 types have been reported to infect the female genital tract [2,3]. There are many commercial methods available for HPV genotyping which are mainly based on PCR and hybridization [4,5]. However, this approach can recognize sufficiently similar HPV types, allowing the detection of a restricted number of types, and hampering discrimination among nucleotide variants [4,6,7]. Next-Generation Sequencing (NGS) has emerged as a powerful tool for HPV genotyping, with the advantage of processing multiple samples at a time [8][9][10][11][12]. Reported NGS studies of HPV infections have shown a good specificity and sensitivity to study viral diversity. The use of this technology could be important to support pilot epidemiological studies and to assess HPV program vaccination effectiveness.
In a previous study, and in order to evaluate HPV types distribution in Tunisian women, we developed a nested PCR based on detection of HPV L1 using PGMY and GP5+/GP6 + primers, followed by Reverse Line Hybridization (RLH) targeting 31 genotypes [13]. This technology, as many commercial methods, has the limits of detecting a restricted number of HPV genotypes especially in cases of multiple infections and low viral load. It showed that 4% of infections were no typed [13]. In order to solve this issue, this study aimed to design and validate a novel HPV genotyping method based on a nested PCR using modified GP primers followed by a Miseq NGS approach.

Material and methods
The study (study reference: (2014/2A/ONMNE)) was approved by the Ethical committee of Institut Pasteur de Tunis, subscribed on the office of human research protections under the reference IORG001040, and conducted with good clinical practice, ensuring confidentiality and anonymity.

Specimens
A total of 86 samples were tested for the validation of the HPV genotyping assay based on Nested-PCR followed by NGS. These include, 43 references plasmids (WHO HPV proficiency panel) and 43 positive HPV clinical cervical specimens.
WHO HPV proficiency panel. To validate the sensitivity and specificity of the Miseq genotyping approach here reported, 43 samples of the 2014 WHO proficiency study panel were used as reference samples. This panel was composed of purified plasmid DNA in which genomic DNA of different HPV types was cloned and diluted in a background of human placental DNA as previously described [14]. As described in the HPV labnet international proficiency study [14], each panel was composed of 43 samples that contained different amounts of plasmids (5 and 50 GE per 5 μL for HPV16 and HPV18 and 50, and 500 GE per 5 μL for the other high-risk HPV types 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, 68a and 68b, and the low-risk HPV types 6 and 11) and three samples, used as extraction control, that contained HPV positive (HPV 16; SiHa Cervical cancer cells 25/2500 GE/5μl) and HPV negative cell lines (HPVnegative C33A cells). Some samples contained different HPV types in various amounts to mimic multiple infections.
HPV positive clinical samples. The forty-three HPV positive samples identified in an earlier study that examined the prevalence of genital HPV infection in the Region of Grand Tunis [13] were used to be genotyped using NGS approach. The description of clinical samples is available in the previous publication [13].
DNA was extracted from a 200μl aliquot of the suspended cell samples. Extraction and quality control by betaglobin PCR, are as previously described [13]. All samples were positive for betaglobin PCR and were further analyzed for HPV detection and genotyping.
Funding: This research was funding by the Tunisian Ministry of High education represented by the Laboratory of Molecular epidemiology and infectious diseases at Institute Pasteur de Tunis. For her stay in Belo Horizonte, Brazil, the student was beneficiated with financial support from the International Division of Institut Pasteur (Calmette and Yersin fellowship). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Genotyping using Miseq. First PCR amplification. The PGMY primers [15] were used to amplify an L1 HPV DNA region in a 50μl reaction as described previously [13]. Then the PCR products were used in a genotyping protocol using GP5+/GP6+ primers as follows.
Libraries preparation. The PGMY PCR amplicons were used in a second PCR using custom designed primers. These primers contain the GP5+/6+ sequence and additional 10 bases Illumina adapter sequences (P5/P7). The sequences of primers are represented in S1 Table. The PCR reaction was conducted in 25 μl mixture contained 2.5μl of first PCR product, 12.5μl Kapa Mix (Kapa Taq Hot Start Mix, TXHRMKB, Roche). PCR cycling parameters were composed of a 3 minutes of initial denaturation at 95˚C, followed by 14 amplifications of 30 seconds at 94˚C, 1 minute at 55˚C and 1 minute at 72, and a final extension step for 5 minutes at 72˚C.
Different specific indexes were added to the generated amplicons. The size of the PCR products (~200 pb) was evaluated with the 2100 Bio analyzer system (Cat. no. G2940CA; Agilent Technologies, Santa Clara, CA, USA). They were purified using the Agent court1 Apure1 XP beads (Cat. no. A63881; Beckman Coulter Genomics, Danvers, MA, USA) following the manufacturer's instructions. After purification, 96 different primer indexes, using the NextEra XT Index Kit, were added to the end of HPV amplicons. Each amplicon had a different index, which allowed the pooling of all the samples together.
Libraries were purified then quantified using Real Time PCR. Pooling was performed at equimolar ratios (4nM) to yield one sequencing sample. In preparation for cluster generation and sequencing, pooled libraries were denatured with NaOH, diluted with hybridization buffer, and then heat denatured before MiSeq sequencing. The sequencing was performed using the Miseq V2 Kit (Fig 1).
Data analysis. Specific reads of each sample were identified according to their respective corresponding index. Adaptor sequences and primers were trimmed using Trimmomatic 0.33 tool [16]. Quality control of sequencing data was performed using the FastQC software (http:// www.bioinformatics.babraham.ac.uk/projects/fastqc/). The cut-off of read depth was fixed to 30 [17]. Good sequences (Q-Score>30) were firstly mapped into human genome using bowtie, an adapted alignment tool for short reads [18]. Only unmapped reads were assembled to contiguous sequences (contigs) using Trinity software (https://github.com/trinityrnaseq/trinity). We processed only high quality reads having at least 80% of base pairs with base calling accuracy of 99.9%and used stringent assembly parameters in the Trinity assembler.
To identify HPV genotypes, contigs were firstly clustered at 3% pairwise distance and a consensus sequence was generated. Consensus sequences were aligned using Bowtie 2 with the L1 region of alpha Papillomavirus reference sequences that were obtained from Papillomavirus Episteme (PaVE) (PaVE; http://pave.niaid.nih.gov) (Fig 2).

Statistical analysis
HPV genotype frequencies were calculated by simple counts and percentages. Association between HPV infections and read numbers was evaluated using chi-squared test. A p-value <0.05 was considered statistically significant.
The agreement between RLH and NGS was calculated using the Cohen kappa test, inter-

Validation of targeted NGS genotyping by WHO panel
To validate the genotyping approach using Miseq, the designed protocol was used firstly to analyze the 43 WHO 2014 proficiency plasmids panel. The generated data showed that the total number of reads was approximately similar in both HPV positive and negative plasmids. Median length of HPV reads was 174 nucleotides (nt), ranging between 137 and 180, while the modal length of the reads was 160 nt but without significant differences among samples with different HPV types, nor between samples with HR-HPV and LR-HPV, nor between single and multiple infections. Meanwhile, the HPV aligned reads in HPV positive samples (minimum: 30; maximum: 19166; median: 6335) was different from those of HPV negative (maximum: 6). The greatest number of reads was obtained in multiple infections (HPV types >1) (p<10 −3 ) (Fig 3).
To confirm the detected HPV types, a cut-off of 30 reads was used. HPV types in the WHO panel were correctly detected and typed with NGS at different DNA amounts (5 to 2500 GE/ 5μL). The negative control of this panel was also identified as negative (Panel C). HPV 18 was detected at the level of 5GE/5μl and HPV16 at the level of 25 GE/5μl. All samples showed complete concordance for all types, even with multiple infections containing up to 4 genotypes. No other HPV types from the plasmids were detected ( Table 1).

Genotyping of positive HPV clinical samples by Miseq
Forty-three HPV positive clinical samples were genotyped using the designed NGS protocol. These samples were firstly genotyped by RLH protocol in a previous study. The results showed that 12 of the 43 samples (4%) had unknown HPV, and a total of 14 types were identified [13].
Using the designed NGS genotyping assay, a cut-off of 30 reads was used to confirm the identified HPV type in each sample. As results, all samples were typed and a total of 19 types were identified in addition to the detection of new HPV types; HPV81, HPV43, HPV74, and HPV62 (Fig 4). Single infections and multiple infections were detected in 67% (29/43) and 33% (14/43) of cases respectively.
Comparison between NGS and RLH. Using Cohen's Kappa test, a perfect agreement between NGS and RLH was observed (κ = 0.83, 95% C.I. 0.60-1). Among the 29 cases of single Workflow for data analysis. The designed pipeline for HPV genotyping in this study contains 3 steps: reads quality control to remove bad quality reads, processing of retained reads for mapping into Human genome, and finally in the final set of selected sequences the viruses were detected and genotyped by assembly and contigs mapping to HPV reference genome.
https://doi.org/10.1371/journal.pone.0255914.g002  Table. RLH and NGS methodology have both detected HPV16, HPV52, HPV6, HPV40, HPV70,  HPV59, HPV55, HPV39, HPV31, HPV53, HPV68 types (Fig 4, S2 Table). NGS allowed the depended on the HPV types in each positive sample (p<10 −3 ): The aligned HPV reads were more numerous in samples having multiple infections (HPV types number >1): the size of dots indicates the number of HPV types in each types sample. detection of types such as HPV6, 31, 70, 39, 42, 11, and 52, that were not detected by RLH despite the presence of their probes on the membrane. Five types were not among the probes used and were detected for the first time in a Tunisian study: HPV81, HPV74, HPV68, and HPV62 [13] (Fig 4, S1 Table). The table below summarizes the differences between the 2 genotyping assays ( Table 2).

Discussion
In this study, HPV genotyping was designed using a Nested PCR with PGMY and GP5 +/ 6 + followed by Miseq sequencing (Illumina platform). PGMY/GP+ had greater sensitivity and broader coverage of HPV genotypes compared to PYGMY alone, GP+ alone, MY11/09 (MY) alone or MY/GP [19][20][21][22]. The choice of the Illumina platform was based on the low number of read errors that can be generated (<0.4%), as well as its ability to generate short-reads that are more appropriate for HPV analysis. In addition, the preparation of the libraries for the Miseq platform did not require a large amount of DNA (4nM) [8].
Setting the NGS cut-off to 30 reads depth was based on recommendations from others studies [13]. For data analysis, the pipeline contained a step of assembly by Trinity software that was used to avoid the problem of samples containing mixed infections. In fact, small-sized  reads can align with one region in the genome of another type, which causes false positives or over-estimation of one type over others [23]. In order to increase the throughput and reduce the cost of the NGS, a double indexing method was developed; it consisted in using primers with Miseq adaptor and indexes for libraries construction. This allowed genotyping of 86 samples (43 clinical samples + 43 proficiency test panel plasmids) in a single sequencing reaction. There are infinite possibilities of multiplexing since several index kits are available to combine indexes during libraries constructions. The study carried out by Xin Yi et al. (2014) [8] described a multiplexing protocol targeting 1170 samples in a single sequencing reaction. Multiplexing is an advantageous alternative in terms of cost effectiveness since the cost of the reagents required for a sequencing reaction is the most expensive component.
In this study, the results showed that NGS identified higher HPV numbers in the co-infections than RLH. Thus, the detection of multiple HR-HPV infection has become a key issue in the development of cervical lesions and the epidemiological status of the population by efficient genotyping method. In fact, the underestimation of the number of HPV types in patients who developed cervical lesions could underestimate the effect of multiple infections as a risk factor for persistent infections in healthy young women. Different studies have reported that multiple HR-HPV acted synergistically in cervical carcinogenesis [24,25], and cancers with multiple HPV infections could be more resistant to therapy than those due to single infection [26]. Comparing the two genotyping approaches of clinical samples, we noticed the limit of RLH in comparison to NGS; RLH was less sensitive than NGS in detecting HPV31, HPV39, HPV84, HPV42, HPV68, HPV11 and HPV52. Some of these types are known to have an epidemiological relevance. In fact, HPV31 is one of the five most prevalent HPV types in highgrade cervical intraepithelial neoplasia and cervical cancer. It was recently added to a new nano-valent HPV vaccine [27,28]. The lack of detection of these types by RLH may be due to a variant which was not covered by the probe or to a cross reactivity with another type present in the same sample. Meisal R et al., in a study of NGS based HPV genotyping reported a considerable variation in HPV type specificity between NGS and a DNA hybridization method [12].
Using NGS we identified new LR-HPV types that were not identified by RLH in the previous Tunisian study [13]: HPV81, HPV74, HPV43, and HPV62. Confirmation of these observations may be needed in cases where the types had low read numbers [12,29]. This is not the case for the detected types in this study; the average numbers of reads of the new HPV types detected (HPV81, HPV43, HPV74, and HPV62) was considerably above the 30 reads depth cut-off. Cross-contamination between wells or tubes during extraction can also be a reason for observation of false positives especially in PCR based assays [12]. However, a low number of reads would confirm this hypothesis; it was not the case for the herein reported results.
The advantage of the detection of new types using NGS has been reported in other studies; Barzon L et al. [9] and Arroyo et al. [10] compared commercialized HPV genotyping kit to an NGS approach and identified by NGS new HPV types such as HPV74, HPV81, HPV53 and HPV62. NGS also detected a wider range of types in multiple infections and new variants [8-12, 30, 31]. In this context Conway et al. [32], genotyped HPV from oral samples demonstrated the ability of NGS technology to detect other HPV types that would not have been detected by traditional methods.
NGS identified novel LR-HPV (HPV81, HPV43, HPV62 and HPV74) that are considered as rare HPV types [33] and limited studies reported their prevalence. These types were not described so far in Tunisian studies neither in healthy women nor in women with precancerous and cancerous lesions [13,[34][35][36], which further indicates its relevance for epidemiology. Thus it will allow a more precise description of the HPV types distribution in a country. A more precise HPV genotyping will provide better correlation to the cytology and insights as regards their role in intra-epithelial cervical lesions.

Conclusion
This is the first study of HPV genotyping using NGS in Tunisia to our knowledge. This approach was considerably more sensitive than the classical reverse line hybridization method. It was also able to identify novel HPV types that were not detected by RLH or other related methods in Tunisia. This can justify the use of such technique for an accurate assessment of the circulating genotypes for a better epidemiological apprehension. Following on this study, the developed NGS protocol will be used for HPV genotyping of all the clinical samples collected in the frame of an epidemiological study in Tunisia, variants evaluation, and phylogenetic analysis.
Supporting information S1