Resequencing Microarray Technology for Genotyping Human Papillomavirus in Cervical Smears

There are more than 40 human papillomaviruses (HPVs) belonging to the alpha genus that cause sexually transmitted infections; these infections are among the most frequent and can lead to condylomas and anogenital intra-epithelial neoplasia. At least 18 of these viruses are causative agents of anogenital carcinomas. We evaluated the performance of a resequencing microarray for the detection and genotyping of alpha HPV of clinical significance using cloned HPV DNA. To reduce the number of HPV genotypes tiled on microarray, we used reconstructed ancestral sequences (RASs) as they are more closely related to the various genotypes than the current genotypes are among themselves. The performance of this approach was tested by genotyping with a set of 40 cervical smears already genotyped using the commercial PapilloCheck kit. The results of the two tests were concordant for 70% (28/40) of the samples and compatible for 30% (12/40). Our findings indicate that RASs were able to detect and identify one or several HPV in clinical samples. Associating RASs with homonym sequences improved the genotyping of HPV present in cases of multiple infection. In conclusion, we demonstrate the diagnostic potential of resequencing technology for genotyping of HPV, and illustrate its value both for epidemiological studies and for monitoring the distribution of HPV in the post-vaccination era.


Introduction
More than 170 human papillomavirus (HPVs) have been characterized and they are classified into the alpha, beta, gamma, mu, and nu genera [1]. Some alpha papillomaviruses presenting a mucous tropism cause some of the most frequent sexually transmitted infections: around 300 million women worldwide carry a HPV infection of the uterine cervix. About 40 HPV genotypes able to infect genital mucous membranes have been identified. Low-risk mucosal HPVs (LR HPVs) are responsible for condylomas found in 1-2% of the sexually active population. High-risk HPVs (HR HPVs) are causative agents of more than 80% of high-grade cervical intra-epithelial neoplasia [2]. Persistent lesions associated with HR HPVs may evolve towards invasive cervical carcinomas. HPV DNA sequences are detected in the vast majority of adenocarcinomas, adenosquamous carcinomas and squamous cell carcinomas of the cervix, all of which are preceded by premalignant lesions [3]. HR HPVs also contributes to the pathogenesis of other anogenital and aero-digestive tract cancers [4].
The need for effective screening for cervical cancer is now recognized worldwide and testing for HPVs is a potentially powerful approach to the detection of cervical or anogenital lesions. Other than the hybrid capture II and the Cervista detection tests [5], most of the available tests for HPVs are based on PCR methods using degenerate or consensus primers. All these methods allow the detection, and some allow genotyping, of a wide spectrum of HPVs. In addition to their clinical value, robust methods of detection are required for epidemiological studies of HPV infections. In particular, they could be used to study the prevalence and distribution of circulating HPV genotypes in vaccinated and non-vaccinated populations.
Several low or high-density microarrays have been developed and successfully used for the detection of several viral families such as Papillomaviridae. Most of them are based on long oligonucleotide probes and virus identification is based only on hybridization patterns. This approach leads to a lack of information for pathogen identification at the single-pair resolution level [6,7]. However, this limitation has since been overcome with the use of the resequencing microarrays (RMAs) method. Resequencing microarray (RMA) technology involves generating a DNA/RNA sequence corresponding to the pathogen(s) present in a sample in a single assay. It allows detection of a sequence with a tolerance of divergence of up to 10-15% from those tiled on the microarray [8,9,10]. We have previously used RMAs for the detection of a large panel of viral and bacterial pathogens by resequencing the hybridised DNA. This approach has been successfully applied to diverse clinical situations such as the establishment of a rapid laboratory diagnostic test for pandemic influenza viruses [11,12], the characterization of different strains of the monkeypox virus [13,14], the genotyping of several members of the Rhabdoviridae family [8] and the identification and characterization of arboviruses and hemorrhagic fever viruses in biological samples [15,16].
The aim of this study was to develop a multiplex bioassay method, based on RMA technology, for the detection of mucosal HPV genotypes of public health importance. Because of the wide genetic diversity of HPVs and to reduce the number of genotypes tiled on the microarray, we inferred reconstructed ancestral sequences (RASs); these sequences are more closely related to different genotypes than such genotypes are to each other. The advantages of using reconstructed ancestral sequences for RMAs has been described and illustrated using pathogen identification with Enterobacteriaceae rpoB sequences used as models [17]. We went on to apply the RMA method developed to detection of HPV DNA sequences in clinical samples. Figure 1. Phylogenetic tree of 38 alpha HPV isolates, constructed to assess the resequencing microarray approach. The tree was constructed from the 416-base sequences of part of the L1 ORF in the different HPVs. The tree was generated by the neighbor-joining method using BioNumerics v5.10 software (Applied-Maths, Belgium). HPV sequences tiled on RMA (red squares), location of RASs on the tree (blue circles) and cloned HPVs tested with RMAs (green triangles) are indicated. doi:10.1371/journal.pone.0109301.g001

Principle and content of resequencing microarray
A third generation resequencing microarray, named VirID v3.0, containing 724 viral sequences distributed into 27 families and 85 genera (Table S1), was used in this study. Among the many viral sequences tiled, a set of 38 different HPV sequences was used. The size of the tiled region ranged from 455 to 482 nucleotides corresponding to part of the L1 open reading frame. This set includes sequences in the following genotypes: HPV6, 11,16,18,26,30,31,33,34,35,40,42,43,44,45,51,53,54,56,58,59,65,66,67,68,69,70,73,74,82,85,88,91,95,97, 101, 103 and 108. In addition, ten reconstructed ancestral sequences (RASs) were tiled on the microarray. These sequences are described in Tables S1 and S2. For each sequence tiled, the principle of RMA is to interrogate each single base of the unknown sequence to be detected with a set of eight appropriate 25-mer probes. Two probes among the eight (four for each sense of the selected sequence) correspond to perfect matches at the central (13th) position of the probe, and the other six probes represent all possible mismatches at the same position.

Ancestral sequences reconstruction
RASs were designed based on phylogenetic analysis of nucleotide sequences of a portion of 416 bases in the L1 ORF of 38 alpha-HPV; there were neither insertions nor deletions in this stretch. A neighbor-joining tree was obtained using software BioNumerics v5.10 (Applied-Maths, Belgium). Ancestral sequences were reconstructed by maximum likelihood using the PAML v4 software [18]. The nucleotide substitution model used was K80 with gamma (number of categories of distinct substitution rates) and kappa (transition/transversion ratio) parameters estimated.
Extraction and amplification of HPV DNA Viral DNA was obtained from cloned HPV DNA and from cervical smears provided by the National Reference Laboratory for Human Papillomavirus at Institut Pasteur (Paris). The cloned HPV DNA used for validation assays was amplified and purified as previously described [19]. Total DNA was extracted from cervical smears using the NucleoSpin Tissue extraction kit according to the recommendations of the supplier (Macherey Nagel, Hoerdt, France). Following extraction, samples were digested with Plasmid-Safe DNase for 12 hours at 37uC according to the manufacturer's instructions to enrich for circular viral DNA (Epicentre). All DNAs were amplified using the Repli-g Mini Kit according to the manufacturer's instructions (Qiagen). The concentration of amplified DNAs was determined with the Quant-it kit (Life technologies) and aliquots of 10 ug were used for hybridization in RMA assays.

Ethical considerations
Liquid-based cytology (LBC) samples were collected from women attending organized cervical screening in 16 pilot sites in France. After completion of cytology, residual LBC samples were anonymized and sent to the French HPV Reference Laboratory for genotyping. The residual material would otherwise have been discarded. According to French regulations of biomedical research, a written or verbal informed consent and an ethical approval are not required for such studies. However, the study was

Hybridization on RMA and data analysis
Amplified DNA was fragmented and labeled using the GeneChip Resequencing Assay Kit (Affymetrix Inc., Santa Clara, CA). After overnight hybridization at 45uC, the RMA was washed, stained and scanned according to Affymetrix' instructions. The raw image file (.DAT), obtained after the scan, was converted to a fluorescence intensity file (.CEL). Bases were called by the GeneChip Sequence Analysis Software (GSEQ v 4.1) which uses a derivative of the ABACUS base-calling algorithm [20]. This algorithm consists of an automated statistical method that analyses raw RMA hybridization data and optimizes the base-calling process. For each base-call, a quality score is calculated from the difference between the best fitting model and the second best one. If this score is below a chosen threshold, an undetermined base ''N'' is assigned to that position. Sequences were output in FASTA format; for each HPV sequence obtained, the call rate value was calculated as the ratio between the number of bases determined (A or T or C or G) and the entire sequence length. The accuracy of the RMA process was determined as the ratio between the number of correctly determined bases and the total number of determined bases, by comparison with the known sequence of the strains tested.

PapilloCheck HPV genotyping test
Clinical samples were tested using the PapilloCheck HPV genotyping kit (Greiner BioOne, Frickenhausen, Germany). This test involves PCR amplification with fluorescent primers (CY5 fluorophore) specific for the HPV E1 gene and the cellular housekeeping gene ADAT1. The amplification products are then hybridized to a DNA chip containing sequences for 18 high-risk genotypes (HPV 16, 18, 31

Validation of RMA using cloned HPV DNA
The third generation of the ''VirID'' RMA (v3.0) was used for the detection and identification of a broad panel of HPV sequences. Thirty-eight HPV sequences were selected and tiled on the microarray, including 32 from the a genus (HPV6, 11, 16 (HPV65, 88, 95, 101, 103, and 108). Reconstructed Ancestral Sequences (RAS) were designed through phylogenetic analysis of selected a genus sequences. Sequences at nodes at different phylogenetic depths on the tree were chosen as RASs (Fig. 1); we checked that all the RASs selected branched directly or very closely to their respective node, with near-zero branch lengths.
Of the 38 HPV sequences deposited on the chip, 11 a-HPV sequences and one c-HPV sequence were tested using cloned HPV genomes (Table 1): all the 12 sequences were detected with call rate values between 92.1 and 97.7%, with a resequencing accuracy of 98.6 to 100%. Confirmation of the HPV genotype was obtained following BLASTN analysis, with scores between 704 (for HPV54) to 809 (for HPV108) ( Table 1).

Genotyping of HPVs with homonymous sequences
Because several sequences were tiled on the chip, we tested whether each HPV genotype could be detected following hybridization either to its homonymous sequence, or to related sequences belonging to the same or different species. Various cloned HPVs were tested, alone or in mixtures with HPVs belonging to the same or different species (Tables 2 and 3). HPV16 and HPV66 were perfectly identified by their homonymous sequences, whether they were on their own or in mixtures; they were both identified by non-homonymous sequences, including HPV30 and HPV33. HPV66 was also detected by the sequences of HPV53, HPV56 and HPV82, whereas HPV16 was detected by the sequences of HPV45 and HPV59 (Table 2). With a mixture of both HPV16 and HPV66, only the most closely related HPV was detected by a non-homonymous sequence ( Table 2). HPV18 was tested alone and in mixtures with HPV45 or HPV53: the three viruses were perfectly detected by their corresponding homonymous sequences. Moreover, they were simultaneously identified by several other non-homonymous sequences ( Table 3): for example, when two HPVs belonging to the same species (such as HPV18 and HPV45) were tested together, no competition was observed for their identification either by homonymous or related sequences. Mixtures containing HPV16, 51, 66, and 108 (Table 2) or HPV6, 18, 42, and 53 (Table 3) were similarly studied, and in all cases the various HPVs were distinguished and identified by both homonymous and nonhomonymous sequences.

HPV genotyping with reconstructed ancestral sequences
We wanted to determine whether all HPVs belonging to a single species could be detected with a RAS specific for this species. Within any family, the RAS is more closely related to the sequences of the various HPVs than the sequences of the HPVs are to each other (data not shown). We tested HPV53, 56 and 66, all members of species alpha 6, and call rates of 59.1 to 75.7 were obtained with RAS-811 (Table 4). These values were much higher than those observed with non-homonymous sequences: 29.5 to 46.0. BLAST analyses of all the sequences obtained confirmed the identification of the genotypes in all cases.
We next investigated whether HPVs that belong to different species could be identified with RASs. We tested a mixture of HPV16, 51, 66 and 108 (mixture A) and one of HPV6, 18, 42 and 53 (mixture B) with reconstructed RASs tiled on the microarray ( Table 5). HPV16 belonging to the alpha 9 species was detected by RASs of the same alpha 9 species (RAS 814 and 815) and by RASs designed for the alpha 1 (RAS 806) or alpha 5 (RAS 812) species. By contrast, the alpha 5 HPV51 and the alpha 6 HPV66 were only detected with RASs corresponding to the same species (RAS 813 and RAS 811, respectively). Similarly, HPV18 that belongs to the alpha 7 species was detected with two RASs of the same species (RAS809 and 810). HPV6 that belongs to the alpha 10 species was identified by RASs of the same (RAS 810) and other (RAS 814 and 815) species. HPV42 and 53 were detected by RASs of families alpha 1 (RAS 806) and alpha 6 (RAS 811), respectively. HPV18 and 45 were tested together, and each HPV was detected by its corresponding RAS (RAS809 for HPV45 and RAS810 for HPV18) without any ambiguity (data not shown). These findings indicate that one single RAS can detect and identify several HPVs; however, the association of RASs with the homonym sequences improved the genotyping of HPVs on their own or in mixtures.

Detection and genotyping of HPVs in clinical samples by RMAs
We assessed whether this RMA method could be used to identify HPVs in clinical samples. A set of 40 DNAs extracted from cervical smears with normal (n = 3) and abnormal (8 Atypical Squamous Cells of Unknown Significance (ASC-US), 1 Atypical squamous cells cannot exclude HSIL (ASC-H), 11 Low-grade Squamous Intraepithelial Lesion (LSIL) and 17 High-grade Squamous Intraepithelial Lesion (HSIL)) cytology were provided by the French HPV Reference Laboratory (Table 6). For each sample, the HPV status was first determined with the PapilloCheck genotyping kit. Most of the samples contained HPV16 or HPV18 on their own or with other LR or HR HPVs. The copy number per cell was between 0.001 and 71 for HPV16 and 2 and 362 for HPV18, as determined by quantitative real-time PCR ( Table 6). The DNA preparations were treated with PlasmidSafe DNase and amplified with phi29 polymerase, and then subjected to genotyping by RMAs with homonymous sequences and RASs. All assays with the Papillo-Check genotyping kit and RMAs were performed only one time except with sample nu2012-312. This DNA was independently amplified and hybridized on RMAs three times. Call rate and standard deviation (SD) values for HPV16 were determined with homonymous, non-homonymous and reconstructed ancestral sequences. SD values ranged from 0.9 to 1.9% according to the kind of considered sequences and were similar to values obtained with a triplicate assay using cloned HPV16 DNA (table 7).
The PapilloCheck genotyping kit can detect and identify 18 HR HPV (HPV 16, 18, 31, 33, 35, 39, 45, 51, 52, 53, 56, 58, 59, 66,  68, 70, 73, 82) and six LR HPV (HPV 6, 11, 40, 42, 43 and 54/ Table 5. Identification of HPVs by reconstructed ancestral sequences. The species corresponding to the RAS is given in parenthesis. b The call rate value for each RAS is indicated when a sequence was identified. doi:10.1371/journal.pone.0109301.t005  Table 6). The RMA method with homonymous sequences appeared to be highly sensitive: HPV16 was detected in a sample in which there was only 1 copy per 1000 cells (sample 2012-266). Eighteen HPVs not identifiable with the PapilloCheck genotyping kit were detected by RMAs in 14 samples (indicated in boldface in Table 6). In particular, two gamma HPVs (HPV 101 and 103) were identified in samples 2012-1931 and 2012-1649; these HPVs were first detected in cervico-vaginal smears [22]. HPV genotyping following RMAs by homonymous sequences and RASs gave compatible results, although fewer HPVs were identified with RASs, particularly when the number of HPVs detected by homonymous sequences was high. However, in one case, HPV39 first identified by PapilloCheck, was detected only by RASs (sample 2012-1945) ( Table 6).

Discussion
Multiple broad-spectrum PCR methods have been developed for the detection of the alpha-HPV genus and most are based on HPV sequences in the L1 open reading frame. Here, we show that resequencing microarray (RMA) technology was feasible and effective for genotyping human papillomaviruses, and is thus a potentially valuable tool for epidemiology. We have demonstrated that the use of homonymous HPV sequences and reconstructed ancestral sequences (RASs), either separately or combined, allows precise molecular identification of different HPV genotypes. We validated the method first with cloned HPV DNA and subsequently for detection of HPVs in clinical samples. The RMA method used involved treatment of cellular DNA with Plasmid-Safe DNase and the universal Phi 29-based amplification; this methodology was highly sensitive for the genotyping of HPVs in clinical samples, with a detection limit of one copy of HPV16 per 1000 cells.
This RMA method allows the detection and characterization of HPV genotypes other than those tiled on the microarray. Indeed, the call rate values were a function of the percentage of divergence between the tiled sequence and the sequence present in the sample. We found that there was no substantial loss of detection signal when the tiled sequence diverged by up to 10-15% from the sequence of the pathogen in the sample. Moreover, the RASs were designed to minimizing the divergence between the tiled and the viral sequences potentially present in clinical samples. Indeed, the use of RASs allowed an improvement in the detection of several HPVs. Nevertheless, the nucleotide diversity of HPV sequences is substantial and not a single RAS was able to detect all the different HPVs present in cases of multiple infections.
We compared this RMA method to a commercial HPV genotyping kit. It must be stressed that clinical samples with more than one HPV were intentionally selected to compare the two methods. Some HPV types identified in clinical samples by the RMA method were not identified by the PapilloCheck test. This discrepancy is in part explained by the number of HPV genotypes, limited to 24, which can be identified by the PapilloCheck test. The HPVs detected by RMAs but not by PapilloCheck tests include both LR HPVs (HPV 32, 34, 54, 74, and 91) and HR HPVs (HPV30, 34, and 67). Also, HR HPV35, 73, and 82, found in some clinical samples by RMAs, were not detected by the PapilloCheck test. Conversely, HPV33, 39, 52, 56, and 70 identifiable in some samples by the PapilloCheck kit were not found by the RMA method. This presumably reflects different sensitivities of the two tests for some HPV genotypes.
Our study has some limitations. First, due to the substantial genetic diversity of HPVs with more than 170 HPV genotypes already identified [1], not all alpha or gamma HPVs were studied, and HPVs belonging to beta, mu and nu genera were not studied. Nevertheless we were able to identify HPV101 and 103 that belong to the gamma 6 species, and that were initially detected in cervicovaginal smears [20]. To increase the coverage of the known HPV diversity, and to facilitate detection of novel genotypes, the number of RASs should be increased. Second, in the current approach, only a part of the L1 gene for each HPV genotype was targeted and tiled on the microarray. The L1 gene may be deleted from the virus associated with some genital cancers, and this would lead to false negative results. The E6 and E7 genes have been described to be highly expressed in high-grade cervical lesions and cancer. Therefore, new generation RMA tests could include sequences corresponding to the E6 and E7 genes, and this would allow detection of both viral DNA and transcripts. The cost of a high-density microarray could appear as a limit because this technology required the synthesis of a physical mask. However, the final cost can decrease if the demand increases, as may be the case in the context of a clinical diagnostic use. Moreover, the use of a RMA of the smallest size only dedicated to detect and characterize HPVs will help to reduce the global cost of assay. Adding of supplementary homonyms and reconstructed ancestral sequences will allow developing countries to obtain a better tool for the screening of HPVs. Based on the sequences obtained, RMAs may be useful for molecular epidemiological studies of HPV infection, particularly in geographical areas where the distribution of circulating genotypes has not yet been investigated. It can be used to identify variants of HPV genotypes that differ from the prototype until 10-15%. RMA-based tests may thus be informative about the prevalence and distribution of diverse HPV genotypes in the post-vaccination era.