Characterization of Human Papillomavirus Type 154 and Tissue Tropism of Gammapapillomaviruses

The novel human papillomavirus type 154 (HPV154) was characterized from a wart on the crena ani of a three-year-old boy. It was previously designated as the putative HPV type FADI3 by sequencing of a subgenomic FAP amplicon. We obtained the complete genome by combined methods including rolling circle amplification (RCA), genome walking through an adapted method for detection of integrated papillomavirus sequences by ligation-mediated PCR (DIPS-PCR), long-range PCR, and finally by cloning of four overlapping amplicons. Phylogenetically, the HPV154 genome clustered together with members of the proposed species Gammapapillomavirus 11, and demonstrated the highest identity in L1 to HPV136 (68.6%). The HPV154 was detected in 3% (2/62) of forehead skin swabs from healthy children. In addition, the different detection sites of 62 gammapapillomaviruses were summarized in order to analyze their tissue tropism. Several of these HPV types have been detected from multiple sources such as skin, oral, nasal, and genital sites, suggesting that the gammapapillomaviruses are generalists with a broader tissue tropism than previously appreciated. The study expands current knowledge concerning genetic diversity and tropism among HPV types in the rapidly growing gammapapillomavirus genus.


Introduction
The papillomaviruses (PVs) are small viruses with icosahedral symmetry and a circular, double stranded genome [1]. These viruses are widely distributed across vertebrates, and among humans alone, more than 170 unique complete genomes of types have so far been sequenced [2]. The papillomaviruses are epitheliotropic and produce hyperproliferation of squamous cells known as papillomas. Human papillomaviruses (HPV) are divided into high-risk types, which are etiologically associated with cancer of the cervix uteri; and low-risk types, which produce benign warts [3,4]. Regarding the taxonomy of papillomaviruses, the International Committee for the Taxonomy of Viruses (ICTV) recommends inclusion of both genotypic and phenotypic information in the definition of viral genera and species [5]. Nevertheless, papillomaviruses have been an exception to the classical rules, and a system based mainly on sequence identity was adopted (reviewed in de Villiers, 2013). Accordingly, a 70% sequence identity amongst L1 ORFs is used as cut-off guide to classify viruses as belonging to the same species [6]. Even so, there is a gray zone (67.5%-70.5% identity) where the distribution curves for interand intraspecies identities overlap, and consequently the classification must be curated [7]. The papillomaviruses have in the past been classified as mucosal or cutaneous according to their tropism. The human cutaneotropic papillomaviruses are represented mainly by the genera Betaand Gammapapillomavirus (b-PV and c-PV). Their prevalence and natural history, for example acquisition soon after birth, reflects a commensalic interaction with the immunocompetent host [8][9][10]. Historically, several cutaneous papillomaviruses have been isolated from patients with the genetic disease epidermodysplasia verruciformis [11,12], and from immunosuppressed patients [13], probably due to higher viral loads among these patient groups [14]. As a consequence of improved methods, several PVs have been characterized, and the number of HPV types of the c-PV genus has been expanded from 16 HPV types in 2010 [7] to 62 representative HPV types in 2013 (retrieved from GenBank, September 2013). A review of the isolation sources of these HPV types could lead to increased knowledge of their tropism. Here, we categorized the different isolation sites of the c-PV as genital, oral, nasal or cutaneous (including skin lesions and healthy skin). The aim of the study was to obtain the complete sequence of a novel papillomavirus, and to present a summary of the isolation sources of the rapidly growing genus Gammapapillomavirus.

Results
HPV154 was isolated from a wart on the crena ani of a threeyear-old boy. The index sample was negative for HPV by PCR using MGP primers [15] and a Luminex system [16], but positive for HPV by FAP-PCR [10]. The FAP amplicon was sequenced and showed the highest identity to the FADI3 fragment (99.3%, acc. no. FJ480954). The FADI3 was originally amplified from a forehead swab of a six-year-old girl [17].
The FAP amplicon represented a putative novel type, as the partial L1 ORF was below 90% of sequence identity to any other known PV type [6]. With or without pre-amplification by rolling circle amplification (RCA), we failed to amplify the complete genome by long-range PCR or with PCRs designed to cover half the genome by a combination of L1-specific primers with degenerated E1 primers (data not shown). As a consequence, we adapted the method for detection of integrated papillomavirus sequences by ligation-mediated PCR (DIPS-PCR) [18] to obtain sequence information outside the FAP amplicon region of the L1 ORF. In order to test the performance of the DIPS-PCR method, we verified the sequence around the integration site of HPV16 into the genome of SiHa cells (data not shown). Several DIPS-PCRs and PCR with degenerated HPV primers were used to obtain sequence data from L2 to almost the end of L1 (Figure 1). Proximal to the ends of that region, new primers were designed that were combined with degenerated primers based on related c-PVs, and two amplicons of ,2000 bp were cloned and sequenced. In order to obtain the complete genome, four overlapping amplicons were obtained by PCR with specific primers (Figure 1). The viral load of HPV154 was 68 genomes per human cell ( Table 1).
The four clones were submitted to the International Reference Center for Human Papillomaviruses at the German Cancer Research Center, Heidelberg, Germany, and the compiled sequence was verified and officially designated as HPV154 (GenBank JN211193).
The genome of HPV154 comprised 7286 bp with a GC content of 37.9%. The most closely related type was HPV136, isolated from the oral cavity [19], with 68.6% identity at the L1 ORF, whereas the uncloned HPV isolate KN3, obtained by high throughput sequencing from healthy skin [20], showed 71.8% identity at L1.
HPV154 demonstrated the classical genomic organization of PVs, with seven ORFs identified ( Figure 1). The putative E6 protein had two zinc-finger domains (CX 2 CX 29/30 CX 2 C) that were separated by 36 amino acids, which are conserved among PVs [21]. Similarly, there was one zinc-finger domain in the inferred E7 protein, as well as the tumor suppressor (pRB) binding domain (LXCXE) [22,23]. In the putative E1 protein, the superfamily 3 ATP-dependent helicase domain was identified [24,25]. The theoretical E2 protein had the typical C-terminal DNA-binding domain and the N-terminal trans-activation domain [26,27]. The E4 ORF showed a start codon; nevertheless it was ignored as we identified the characteristic donor (AAG/ GUASNR) and acceptor (GUYACYAG/YU) RNA splicing sites [28], and the resulting putative E1 ' E4 fusion protein with six amino acids from the E1 N-terminal end.
The early polyadenylation site (AATAAA) for processing of early mRNAs, was located at the 59 end of the L2 ORF, while the late polyadenylation site was downstream of L1 within the upstream regulatory region (URR). The URR was relatively short, being 517 bp, and contained six putative E2 binding sites (E2BS, ACCN 6 GGT). Near the E2BS proximal to E6, several E1 binding sites (AACAAT) or related AT-rich sequences were identified and probably represent the origin of replication [29,30]. A TATA box was found (pos. 6973-6977) and surrounded by two E2BS in close proximity (2 and 14 bp).
As the FADI3 amplicon and HPV154 were obtained from samples from children, we tested whether this PV type might have an increased presence in this group. Among the cutaneous swab samples, HPV154 was detected in 3% (2/62) by real-time PCR. One of the samples showed a viral load of 0.093 genomes and the other 1.3 genomes per human cell ( Table 1).
The phylogenetic relationships of HPV154 were inferred based on the complete genomes of 91 HPV-sequences ( Figure 2). HPV154 was positioned on a divergent group, along with the members of the currently proposed c-PV11 species (pending ICTV approval) ( Figure 2). HPV154 is below the 70% identity limit suggested for species definition compared to its closest relative type, HPV136 (68.6%). Moreover, the lowest values of identity for HPV154 within c-PV11 ( Figure 2) are with HPV140 (65.37%) and HPV169 (65.86%).
In order to increase knowledge of the biological niches of gammapapillomaviruses, we enumerated the isolation sources (complete HPV-genomes) and detection sites of all HPV types from this genus ( Table 2). The c-PV1 species contains members isolated from a wide variety of sites, including healthy tissue such as the skin, nasal cavity and male genitalia. However, they are also found in non-healthy tissue such as common warts, pigmented verrucas, squamous cell carcinoma (SCC) and actinic keratosis (AK). In a similar fashion, HPV types of the c-PV3, c-PV7, c-PV8, c-PV9, c-PV10, c-PV11 and c-PV15 species have been found in several different sites, including both mucosal and cutaneous sources ( Table 2). The c-PV6 members (HPV101, 103 and 108) are the only c-PV restricted to genital and oral mucosal sites (six isolates). On the other hand, the c-PV12 members have been isolated only from warts and healthy skin (eight isolates). The remaining species, which include c-PV2, c-PV4, c-PV5, c-PV13, c-PV14, c-PV16, c-PV17 and the six putative novel species ( Figure 2), have members with less than three findings, but include both mucosal and cutaneous sites ( Table 2).

Discussion
From a wart on the crena ani of a three-year-old boy we characterized the novel HPV154, which was added to the rapidly growing genus Gammapapillomavirus. The genome of the HPV154 demonstrated the typical genomic organization of papillomaviruses and was lacking the E5 ORF, as with c-band m-HPVs [31]. The prevalence of HPV154 was 3% in skin swabs from healthy children, which is in a similar range as other HPV types of the beta and gammapapillomavirus genera detected among adults [32][33][34][35][36][37][38]. A limitation of the study was that only samples from children were used in the screening of HPV154. Another limitation was that HPV154 was characterized from a swab sample, so we cannot assert that the virus caused the lesion. Our initial approach of obtaining the complete HPV genome by long-range PCR failed, which may be due to the low concentration of HPV154 (Table 1) and/or fragmentation of the genome. Instead we used PCRs with degenerated HPV primers and an adapted version of the DIPS-PCR method [18]. It was notable that several sequence reads of the restriction enzyme site for the adapter ligation of DIPS-PCRs could not be verified in the final sequence, which might indicate star activity of the restriction enzyme prior to ligation of adapters. Thus, compiled sequences from DIPS-PCR should be verified by independent PCR. Nevertheless, we showed that the DIPS-PCR can expand the length of sequence information, which in turns allows to design HPV primers at positions that reduce the size of the targeted long-range amplicons, which would probably increase amplification efficiency. However, our approach is laborious and time-consuming compared to modern methods such as highthroughput sequencing, and would be a feasible alternative only in cases where the latter methods are not available or other classical methods such as RCA and long-range PCR have failed.
HPV154 demonstrated the typical genomic organization of papillomaviruses, but lacked the E5 ORF as in all known members of the b-cand m-PV genera [31]. The prevalence of HPV154 (5%) in skin swabs from healthy children was in a similar low range as for other HPV types of the band c-PV genera detected among adults [32][33][34][35][36][37].
Concerning the presented phylogenetic tree, HPV154 was placed with high support along with members of the currently proposed c-PV11 species [2]. Nevertheless, the percentages of L1 ORF sequence identity among different members of this group were closer to the interspecies center of the distribution (64.5%) than to the intraspecies center (72%) as shown in the histograms of L1 sequence identity from 189 PVs [7]. Even using 68.5% of identity as a more stringent/divergent limit for inter-species distance, three different species could be delimited, as shown with a gray dotted key in Figure 2.
Accordingly, a more stringent/divergent limit of 68.5% identity allowed us to split the proposed c-PV11 into three putative novel species. However, the proposed c-PV11 [2] were based on HPV types with tropism for the oral cavity, but additional HPV types within this species had been isolated from healthy skin and warts as in the case of HPV154 (Table 2). According to ICTV ''A species is a monophyletic group of viruses whose properties can be distinguished from those of other species by multiple criteria'' (http://www.ictvonline.org/codeOfVirusClassification.asp), but among a majority of the c-PV species it is difficult to find distinctive characteristics besides sequence identity. Although species definition for c-PV is currently based mainly on sequence identity [2,6,7], the criteria have varied over time, and hence have not been applied homogeneously to all members of the genus. For example, HPV48 (c-PV2) shares 68.01% identity with HPV50 (c-PV3) and 69.92% with HPV131 (proposed c-PV14). A system with well supported phylogeny would help to define species, and it may be safer to leave new members unclassified at the species level until more isolates are sequenced.
With regard to the isolation sites enumerated in Table 2, it seems that c-PVs can be found in a broader variety of sites than previously appreciated. Diverse sites are found for members of the same species or even type. Even though very few studies have searched for papillomavirus in the oral or nasal cavities [19,[39][40][41], members belonging to nearly every species have been identified in those studies. Altogether, this supports the idea that c-PVs are generalist with various forms of tropism. The only exceptions seem to be the members of the c-PV6 species, HPV101, HPV103 and HPV108, as they appear to have only mucosal tropism. This group also distinctively lack the E6 ORF and have been associated with disease [42,43].
However, our approach to summarize isolation sites of gammapapillomaviruses described in reports provides only a suggestive mode to study the tropism of these viruses. In order to perform a rigorous evaluation of the tropism for each HPV type, a random population study with samples collected from several sites is needed.
In this study we successfully characterized the novel HPV154, thereby expanding knowledge about the diversity and tropism of HPV types in the rapidly growing c-PV genus. In addition, we suggest that gammapapillomaviruses are generalist with broad tissue tropism. We expect that improved amplification methods will promote the discovery of additional HPV types of the c-PV genus, and will shed light onto its apparent wider tropism compared to other papillomavirus genera.

Ethics Statement
Ethical approval for this study was granted by the Ethical Committee of Lund University (LU 106-01). The samples were obtained with written informed consent from the parents of the minors.

Sample Processing: DNA Extraction
A swab suspension of 200 ml was processed using the automated MagNA Pure LC with the Total Nucleic acid kit (Roche), and eluted in 100 ml.

RCA
The eluted total DNA was subjected to rolling-circle amplification (RCA) using illustra TempliPhi 100 Amplification Kit (GE Healthcare), basically following the manufacturer's instructions, but in a slightly modified form. As indicated elsewhere [45], the final concentration of each dNTP was increased to 450 mM and the reaction was incubated overnight.
'' '' Nasal cavity [39] '' '' Skin [9] FA35 (96.8) '' CG1`Skin [20] product previously digested was combined with the appropriate (enzyme specific) double stranded adapter (50 pmol) and ligated by adding ATP (0.5 mM), DTT (10 mM) and 1.75U of ligase (Fermentas) to a final volume of 27 ml. The reaction was incubated at 16u overnight, and later diluted to 40 ml with water and was used as template for all subsequent PCR steps. Two microliters of the ligation products were subjected to an initial linear PCR amplification using a viral specific primer (0.2 mM) in 1x PCR buffer (Roche), 1.8 mM MgCl 2 , 0.2 mM each dNTP, 1.25 U of Taq polymerase (AmpliTaq Gold, Roche). The cycling conditions were 94uC 5 min; 45 cycles of 94uC 15 s, 50uC 20 s, 72u 3 min; 72u 5 min. A second PCR (exponential) using 2 ml of the linear PCR as a template was made using a nested viral-specific primer (primer sequence available upon request) combined with the adapter-specific ''AP1'' primer (ggccatcagtcagcagtcgtag) in the same conditions as the linear PCR but with 0.5 mM of both primers. The cycling conditions were 94uC 5 min; 35 cycles of 94uC 15 s, 55uC 20 s, 72uC 3 min; 72uC 5 min. The PCR products were evaluated by electrophoresis, and the longest product (from the different restriction enzymes) was selected, cloned using a TOPO-TA cloning kit (Invitrogen) according to the manufacturer's instructions, and sequenced at MWG (Germany). The novel sequence was used to design primers with Oligo 7 software (Molecular Biology Insights, USA) in order to carry out successive steps of linear/exponential PCR.

PCR with Degenerated Primers and Overlapping Amplicons
Primers were designed in the L2 ORF from a multiple alignment of HPV sequences (HPV48, 50, 60, 65, 88, 95, 112 and 116) related to the known FAP region of HPV154. The degenerated primer ''L2 gamma F'' (tttgratwtgaaaatcccgccttt) was used in conjunction with the HPV154 specific primer ''PV77 FADI3 R'' (gacctgtgctaccgactccaag). The cycling conditions were 94uC for 5 min; 40 cycles of 94uC 15 s, 52uC 20 s, 72uC 1 min and a final extension of 72u 7 min. The PCR product was purified with an Illustra Microspin S-300 HR column (GE Healthcare) and sequenced with a nested specific primer.
A second set of degenerated primers was designed for the E1 ORF. The primers 'E1 gamma Fnew' (gacagtggdatwkddgaagatgaa) and 'E1 gamma Rnew' (ttcatcttcddmwatdccactgtc), were combined with 'FADI3 22 nes R' and 'FADI3 +2 nes F', respectively (0.3 mM final concentration). One microliter of RCA-pre-amplified template was subjected to Long Range PCR (Expand Long Template PCR System, Roche) amplification in 1x PCR buffer 2 (Roche), 0.5 mM each dNTP, 0.7 U of Enzyme mix (Expand Long Template PCR System, Roche). The cycling conditions were 94uC 2 min; 10 cycles of 94uC 15 s, 50uC 1 min, 68uC 4 min and 30 cycles of 94uC 15 s, 59uC 30 s, 68uC 4 min. The product was electrophoresed and the bands excised from the agarose gel, purified with a QIAquick Gel Extraction kit (Qiagen) and cloned using a TOPO-TA cloning kit (Invitrogen), according to the manufacturer's instructions.
The remainder of the genome was amplified using an Expand High Fidelity PCR system (Roche, Mannheim, Germany) with 2.5 ml of the RCA amplified DNA (diluted 1:100) as a template; 0.3 mM of each primer, 3.5 mM MgCl 2 and 1.24 U of enzyme in a final volume of 50 ml. The cycling was 94uC 2 min; 40 cycles 94uC 15 s, 55uC 20 s, 72uC 2 min increasing 5 s/cycle; 72uC 10 min. The product was purified and cloned as indicated before.

Screening of Samples from Children
Forehead samples from children from three age groups, one month (23), one year (19), and four years old (20), were obtained from a previous study [9] and used as templates for the screening. The samples were swab suspensions in 0.9% NaCl, and were used directly as PCR templates. Primers intended for real-time PCR were designed using Oligo 7 (Molecular Biology Insights, USA) and tested for possible cross-reactivity with other types using Primer-BLAST [46]. The resulting primers were 'HPV154 5511 For data analysis the ABI 7500 software v2.0.6 was used and the threshold for positivity was automatically calculated. The specificity of the HPV154 PCR was also analysed by gel-electrophoresis for identification of the expected 137 bp amplicon. Plasmid DNA concentration of HPV154 clone L (Figure 1) was quantified using a spectrophotometer (NanoDrop ND-1000, Nanodrop Technologies, Oxfordshire, UK). The sensitivity of the method was evaluated using serial dilutions of the clone L (Figure 1), and 10 copies were detected in a background of 1 ng of human DNA (Sigma-Aldrich, art. D 7011). The positive HPV154 control had a Tm of 75.4C (CV: 0.2%, based on four measurements).

Viral Load of HPV154 Positive Samples
The number of viral genomes per cell was quantified by carrying out two separate quantitative real-time PCR assays to amplify a part of the HPV154 L2 gene and the human b-globin gene. For the quantification of HPV154 the primers and PCR conditions were identical to that used for screening of samples from children as described above. Quantification was extrapolated from a linear regression standard curve obtained from serial dilutions of 100,000 to 100 copies per PCR of HPV154 plasmid DNA (clone L, Figure 1) in a background of 10 ng human placenta DNA (Sigma-Aldrich, art. D 7011). The standard curve had a slope of 23.4, y intercept of 37.3 and r 2 of 0.99. The PCR efficiency calculated from slope was 97.5%. Similarly, in order to calculate the number of cells analyzed per sample, the b-globin gene was amplified with PC03 and PC04 primers in a 25 ml PCR reaction containing 2.5 mL template [44]. The standard curve was obtained from serial dilutions of 50,000 to 50 copies per PCR of the b-globin gene using human placenta DNA (Sigma, art. no D7011). The standard curve had a slope of 23.4, y intercept of 39.9 and r 2 of 0.99. The PCR efficiency calculated from slope was 97.5%. For the calculations of number of human cell per sample, the copy number of b-globin was divided by two. We assumed that each human cell carries two b-globin gene copies and that the diploid genome equivalent contains ,6.6 pg DNA. No-template controls of water samples were tested of both the HPV154 PCR and the human b-globin gene PCR. All samples were analyzed triplicate. Coefficient of variation was calculated for each triplicate measurement of viral copy number per human cell. In the quantitative PCR, negative controls showed no Ct values.

Bioinformatic Analysis
General sequence handling and feature identification were carried out using the UGENE software v1.11.5 (http://genome. unipro.ru). Putative proteins from ORFs were generated by UGENE and they were then searched for similarities with other proteins using BLASTp (http://blast.ncbi.nlm.nih.gov). Proteins were analyzed for unique domains with ScanProsite (http://www. expasy.ch/prosite) and SMART [47], including searches in the Pfam database (http://pfam.sanger.ac.se/search) [48]. A Python v3 script was made to compare pairwise identities from multiple sequence alignments (available upon request), in which all differences, including terminal gaps, were counted.

Phylogenetic Analysis
In order to avoid missing any gamma-related virus, all complete genomes were retrieved from GenBank (914 accessions), from which 723 unique genomes remained after dereplication using Usearch v6.0.307 [49]. A workflow made using UGENE software [50] was used to extract the FAP region from L1 ORFs, align it using uMUSCLE [50,51], and the resulting alignment was then converted to phylip format using ALTER website [52]. It was used later to make an ML-tree with RaxML (400 rapid bootstrap inferences and thereafter a thorough ML search). The branch containing all gamma-related viruses was selected using Dendroscope v3.1.0 [53] (excluding beta types). From those sequences, L1 was extracted, aligned, and dereplicated again to remove identical accessions or genomic variant sequences. The complete genomes of the remaining 91 unique sequences were manually edited to have L1 at the 3' end and then they were aligned with uMUSCLE. It was later used to make a Maximum Likelihood phylogenetic inference using RAxML v 7.4.2 [54], compiled under Linux as AVX and PTHREADS versions, and run using the raxmlGUI v1.3. Gaps were treated as missing data and the General Time Reversible (GTR) under the gamma model of rate heterogeneity was selected as a nucleotide substitution model to make 20 inferences with 150 thorough bootstrap replicates (automatically determined by the program using the majority rule tree based criteria, command ''-N autoMR''). The sequence of the betapapillomaviruses HPV5 (acc. no. M17463) and HPV9 (acc. no. X74464) was used to root the tree. A graphical representation of the tree was made with FigTree software v 1.3.1 [55] and edited with Inkscape v0.48. The GenBank accession numbers of the sequences in the tree are as follows: AsPV1, HQ625440; BPV11,