Unbiased Approach for Virus Detection in Skin Lesions

To assess presence of virus DNA in skin lesions, swab samples from 82 squamous cell carcinomas of the skin (SCCs), 60 actinic keratoses (AKs), paraffin-embedded biopsies from 28 SCCs and 72 kerathoacanthomas (KAs) and fresh-frozen biopsies from 92 KAs, 85 SCCs and 92 AKs were analyzed by high throughput sequencing (HTS) using 454 or Ion Torrent technology. We found total of 4,284 viral reads, out of which 4,168 were Human Papillomavirus (HPV)-related, belonging to 15 known (HPV8, HPV12, HPV20, HPV36, HPV38, HPV45, HPV57, HPV59, HPV104, HPV105, HPV107, HPV109, HPV124, HPV138, HPV147), four previously described putative (HPV 915 F 06 007 FD1, FA73, FA101, SE42) and two putatively new HPV types (SE46, SE47). SE42 was cloned, sequenced, designated as HPV155 and found to have 76% similarity to the most closely related known HPV type. In conclusion, an unbiased approach for viral DNA detection in skin tumors has found that, although some new putative HPVs were found, known HPV types constituted most of the viral DNA.


Introduction
Human papillomaviruses (HPVs) are a large and diverse group of viruses with more than 150 completely characterized types, with new HPV types being continuously found [1][2][3][4][5][6][7][8][9][10][11][12]. HPVs infect keratinocytes in mucosa or skin. There are five major HPV genera: Alphapapillomavirus, Betapapillomavirus, Gammapapillomavirus, Mu papillomavirus and Nu papillomavirus. HPV types belonging to different genera have less than 60% similarity, based on the nucleotide sequence of the capsid protein L1. Different viral species within a genus share between 60 and 70% similarity. A novel HPV type has less than 90% similarity to any other HPV type [13]. Novel HPV types are given a number only after the whole genome has been cloned and deposited with the International HPV Reference Center [6,13].
HPVs cause a wide range of diseases from benign lesions to invasive tumours [14,15]. The oncogenic mucosal HPV types in the Alphapapillomavirus genus are a major cause of cervical cancer [16], as well as anal, vulvar, and oral cancers [17]. There are also benign mucosal HPV types in the Alphapapillomavirus genus that cause benign genital condylomas [6].
NMSCs such as squamous cell carcinoma (SCC) and basal cell carcinoma (BCC) are two of the most prevalent cancers among Caucasian populations worldwide [22]. Ultraviolet radiation is a well-known risk factor [23], but there may also be other risk factors [24,25]. The increased incidence of SCC in immunocompromised individuals has suggested that an infection may be involved [24,[26][27][28], with HPV being the most commonly studied candidate infectious agent [20,29].
Skin samples from SCC, AK, seborrhoeic keratosis (SK), and BCC commonly contain multiple HPV types, but typically at very low viral loads [4,5,7,19,30,31]. Conventional methods for virus detection suffer from low throughput and are furthermore biased towards identifying sequences homologous to the PCR primers used. There are several examples of NMSC lesions that contain very high amounts of previously uncharacterized HPV types that may escape detection by PCR systems based on known HPV sequences [7,30]. This has made it difficult for epidemiologic studies to perform an unbiased assessment of which HPV types are present in healthy skin as compared to benign tumors or NMSCs.
High throughput sequencing (HTS) can be used for an increased sensitivity in HPV detection by sequencing of general HPV-PCR amplimers [11]. However, HTS can also be used to obtain a comprehensive and more unbiased sequencing of the DNA present in a sample, without any prior PCR or other prior information about sequences [32]. The HTS technology has been increasingly used for discovery of new viruses [12,[33][34][35][36]. The aim of this study was to apply the HTS technology for an unbiased analysis of which known and unknown HPV types are common in SCCs, AKs, and KAs.

Patients
We used four different patient series. One hundred and thirtyone patients registered with non-melanoma, non-basal cell skin cancer (ICD7ucode 191) in the Swedish Cancer Registry with formalin-fixed paraffin-embedded (FFPE) biopsies stored at the Pathology Department of Malmö University Hospital, Sweden, were identified and re-reviewed by an expert pathologist [37]. We included 28 specimens that were classified as SCC and 72 specimens re-classified as KA. The paraffin blocks were sectioned and DNA extracted as described [37].
At the Department of Dermatology and Plastic Surgery at the Norwegian National Hospital, Oslo, Norway, biopsies from 92 KAs from both immunosuppressed and immunocompetent patients were collected. The DNA was extracted using the QIAampDNA Minikit (Qiagen, Hilden) [19].
From a hospital-based case-control of NMSCs, premalignant and benign lesions [29], conducted in Sweden and Austria, we included samples collected from immunocompetent patients with SCCs (n = 85), AKs (n = 92), BCCs (n = 118) and SKs (n = 46). All patients provided four different samples: swab samples from the top of the lesion and from healthy adjacent skin, as well as a biopsy of the lesion and from healthy adjacent skin. The swab samples were collected by a pre-wetted (0.9% NaCl) cotton-tipped swab that was rolled on the lesion (within margins of the lesion) and suspended in 1 ml of saline.
From a Swedish hospital-based study, we included 69 immunocompetent patients (SCC (n = 35), AK (n = 22), BCC (n = 3), KA (n = 8) and SK (n = 1)). All patients donated biopsies and swab samples in the same manner as described above. For the latter 2 series the biopsies were taken from the skin after tape-stripping, to remove possible environmental contaminations [21].

Ethics statements
Written informed consent was obtained from participants. The study adhered to the declaration of Helsinki and was approved by the Ethical Review Committees of Karolinska Institute and of Lund University (Sweden), Medical University Vienna (Austria) and Institutional Review Board in Oslo (Norway).

Sample preparation
Three different sample preparation methods were used: E-gel followed by whole genome amplification (WGA), ultracentrifugation followed by WGA and direct WGA of the sample.
Separation of long chromosomal DNA from shorter DNA using E-gel. Two to four ml of each sample were mixed according to diagnosis and sample type to pools of 6 to 11 patients each. The pools were run on an E-gel (Invitrogen, Carlsbad, CA) where DNA was separated by gel electrophoresis. DNA in the size range from ,3 to 10 kb was eluted from the gel.
To obtain sufficient amounts of DNA for the 454 sequencing, we performed whole genome amplification (WGA) using Geno-miPhi High Yield (GE Health Care, Buckinghamshire, United Kingdom) on all pools described above. The manufacturers protocol was followed, with some modifications. Five ml of sample was mixed with 1 ml 106 BSA (New England Biolabs) and 22.5 ml of sample buffer. The mix was incubated at 95uC for 3 min and then cooled on ice. For the amplification reaction 22.5 ml reaction buffer was mixed with 2.5 ml enzyme mix and added to the samples, incubated at 30uC for seven hours and inactivated at 65uC for 10 minutes. Sequencing products were dissolved by diluting the samples 1:2 in TE-buffer (10 mM Tris-HCl, 0.1 mM EDTA, pH 8.0) and rotated on an orbital shaker (500 rpm) over night. Prior to sequencing, the smaller pools of 6 to 11 patients were mixed into seven pools (28 paraffin embedded samples from  SCCs, 72 paraffin embedded samples from KAs, 92 fresh frozen  KA biopsies from Norway, swab samples from the top of the lesion  from 82 SCCs and 60 AKs, 41 fresh frozen biopsies from SCCs negative for HPV in a previous study [29], 44 fresh frozen biopsies from SCCs positive for HPV in a previous study [29] and 92 fresh frozen biopsies from AKs).
Separation of viral capsids from human chromosomes using ultracentrifugation. Swab samples from the top of the lesion from 82 SCC patients and 60 AK patients were prepared for density gradient ultracentrifugation by mixing 2 ml of each sample into 4 pools of about 35 patients each. Optiprep density gradient medium (Sigma) was used in concentrations of 39%, 33%, and 27%, as described [38]. The gradient layers had a volume of 300 ml. After adding the pools to the top of the gradient, the 5 ml polyallomer ultracentrifuge tube was filled with standard PBS. Centrifugation was performed with a Sorvall Discovery M120 SE centrifuge with swing-out rotor S52-ST at 50 000 g for 3:30 h. After the centrifugation, the bottom of the polyallomer tube was punctured with a needle and fractions collected. The first 100 ml fraction was discarded (where the human DNA is sedimented [39]) and the next fraction of 350 ml was collected (where virus particles but no human DNA is sedimented [39]). DNA was extracted with MagnNA Pure LC using the Total Nucleic Acid kit (Roche), Before sequencing, amplification using GenomiPhi High Yield was performed using five ml of the extracted DNA from each of the four pools.
Pooling of samples. DNA was amplified using GenomiPhi High Yield on each small pool of 8 to 23 patients before samples were pooled into seven larger groups (28 paraffin embedded samples from SCCs, 72 paraffin embedded samples from KAs, 92 fresh frozen KA biopsies from Norway, swab samples from the top of the lesion from 82 SCCs and 60 AKs, 41 fresh frozen biopsies from SCCs negative for HPV in a previous study [29], 44 fresh frozen biopsies from SCCs positive for HPV in a previous study [29], and 92 fresh frozen biopsies from AKs) and sent for 454 sequencing. The pool of swab samples of SCCs & AKs was also sequenced using Ion Torrent 300 and 400 bp sequencing kits.

Sequence analysis
Sequences were obtained from the 454 Genome Sequencer FLX Titanium ( Roche) at the Science for Life Core Facility in Stockholm, Sweden and Ion Torrent PGM sequencer (Life Technologies) at the Science for Life Core Facility in Uppsala, Sweden. Multiplex identifiers (MIDs), included in the 454 adaptors, were used to assign the sequences obtained to the originating sample.
Duplicated sequences were identified using cdhit-454 algorithm [40] and were removed from further analysis. The remaining sequences were quality checked and base pairs with Phred quality score of less than 15 were defined as ambiguous. A Phred quality score of 15 equals to a base calling accuracy of 97% or 3 errors in 100 bases [41]). Tails containing ambiguous base pairs were trimmed off. Thereafter, sequences with the length of $80 base pairs and containing unambiguous base pairs of at least 80% of their length were selected for further analysis.
The selected high quality reads were compared to viral, human and bacterial DNA in GenBank, as well as to sequencing reads from negative control samples (water) [39] using the SSAHA2 software [42]. Reads with at least 90% identity over 80% of their length to human or bacterial DNA or to sequencing reads from negative control samples (water) were removed from further analysis. Remaining sequences were processed for assembly of contiguous sequences (contigs). Genome assembly was performed de novo using the sequence assembly software MIRA (minimum overlap of 40 base pairs with at least 98% identity) [43] and Celera Assembler's CABOG pipeline (minimum overlap of 40 base pairs with at least 98% identity) [44]. CABOG generated longer and fewer contigs than MIRA and all final analyses were conducted using CABOG. After manual inspection and finishing of assembly using the visual analytics tool Hawkeye [45], all contigs and unassembled sequences (singletons) were compared against GenBank nucleotide database using the BLASTn algorithm [46] (reward for nucleotide match = 1; penalty of nucleotide mismatch = 1; cost to open a gap = 0; cost to extend a gap = 2; e-value#e 24 ) to classify them as papillomaviruses-related or not. To identify possible artifactual ''chimeric'' sequences, that contain sequences originating from different viruses, all papillomavirusesrelated contigs and singletons were checked as described [36]. Shortly, the sequence that aligned to its most closely related sequence in GenBank was divided into three equal segments. If at least one of the segments differed in similarity to the corresponding overlapping parts with more than 5% (for example if segment 1 was 88% similar and segment 2 was 94% similar) the sequence was considered as a ''possible chimera''.
All analyses were performed using in-house R (www.R-project. org) and python (www.python.org) scripts. Coverage plots were generated using Circos visualization tool [47].

Screening of HPV155 and SE46
Prevalence of HPV155 (SE42) and SE46 was investigated using . Standard curves used serial dilutions from 500,000 to 0.5 copies/PCR of purified viral DNA from plasmids containing the real-time PCR-fragment of HPV155 or SE46 in a background of 10 ng/ml human placental DNA (Type XIII, Sigma-Aldrich). The PCR mixtures were prepared in a clean room, and contained in a total of 25 ml; 2.5 ml sample, 0.4 mM of each primer, 0.04 mM MGB-probe, 1x TaqManH Universal PCR Master Mix (Applied Biosystems) in sterile water (Sigma-Aldrich, Steinheim, Germany). Samples were added to the PCR mixture in a separate room. Water and TEbuffer (10 mM Tris-HCl, 0.1 mM EDTA, pH 8.0) were used as non-template control in each run. The PCR-analyses were carried out in a 7500 Fast Real-Time PCR System, using the 7500 Software v.2.0.5 (Applied Biosystems), with the following temperature settings; 2 min at 50uC and 10 min at 95uC, followed by 50 cycles at 95uC for 15 s and 60uC for 1 min. The threshold was set to 0.1 DRn (Rn is the fluorescence of the reporter dye divided by the fluorescence of the passive reference. For DRn the baseline fluorescence has been subtracted). Samples were run in triplicate and scored positive if at least two tests became positive.
The mean viral copy number of HPV155 and SE46 were calculated, and was for the biopsies normalised against the amount of human cells as determined in a single test by the use of a realtime PCR of the human b-globin gene [48]. The b-globin assay used a standard curve with serial dilutions from 100,000 to 1 copy per PCR of human placental DNA (Type XIII, Sigma-Aldrich) diluted in sterile water (Sigma-Aldrich).
The fragments were cloned using the TOPO TA Cloning kit (Invitrogen) using the pCRH2.1-TOPOH vector, according to manufacturer's instructions. The fragments were sequenced using both conventional primer walking and using the 454 FLX sequencer GS Junior (Roche). The 454 sequencing of the HPV155 plasmids was performed in-house, using the manufacturers instructions. The primers for the Sanger sequencing were from Eurofins MGW Operon (Germany).

Sequence analysis
The sequencing using 454 technology of swab samples from 82 SCCs and 60 AKs, paraffin-embedded biopsies from 28 SCCs and 72 KAs and fresh-frozen biopsies from 92 KAs, 85 SCCs and 92 AKs obtained 1,359,108 sequence reads. Ion Torrent PGM sequencing of swab samples from 82 SCCs and 60 AKs, using 300 and 400 bp sequencing kits obtained 912218 and 381017 sequencing reads, respectively (Table 1). After removal of duplicated sequences and filtering based on quality scores and read length, 1,116,074, 717880 and 274090 reads remained for 454, Ion Torrent 300 bp kit and Ion Torrent 400 bp kit respectively. They were classified into ''human'', ''bacterial'', ''viral'', ''negative control (water)'' [39], ''other'' and ''unknown'' ( Table 1). As ''other'', we have classified reads with similarities to other sequences present in GenBank (e.g. plants, plant viruses, synthetic constructs). Sequences, which did not give any hits to sequences deposited in GenBank, were classified as ''unknown'' (Table 1). Previously, we reported presence of bacterial related sequences and sequences classified as ''other'' and ''unknown'' in negative control samples (water) after 454 sequencing [39]. Such background sequences might be present due to background reactivity of Phi29 polymerase reaction [49] or environmental contamination. As water controls were found to be uniformly negative for viral sequences [39], we removed all sequence reads that were related to the sequences found in water control.
We also identified sequences from two HPV types only distantly related to any known HPV types (designated as SE46 and SE47) ( Table 2). Twenty-two reads from the pool of swab samples from the top of SCC and AK lesions formed the SE46 contig (GenBank accession number: JX198657), which was most closely related to the Gammapapillomavirus-type HPV112 in the E2 ORF of the genome. Two SE46 related singleton sequences were also found in fresh frozen samples from patients with KA. Ion Torrent 300 bp kit sequencing identified 132 reads related to SE46, which was assembled into two 950 bp and 3774 bp long contigs. SE47 was most closely related to the Alphapapillomavirus-type HPV94. Two  Checking of sequences for possible artifactual sequences composed of sequences from several different viruses (''chimeric sequences''), by comparing segments of the sequences for differential similiarities against the most closely related sequences in GenBank did not reveal any possible ''chimeric'' sequences.

Sensitivity analysis
In order to obtain an approximate estimation of the sensitivity of the 454 HTS method, we analysed a clinical sample pool where eight reads of HPV16 were found. Real-time PCR determined that after WGA-amplification the sample contained 17 million copies of HPV16/microliter. We also performed real-time PCRs for HPV155 and SE46 on 142 clinical samples (Table 3). Only 4/ 142 and 5/142 samples were positive for HPV155 and SE46, respectively. The amount of virus present in the unamplified pool of these samples would have been 46,5 copies/2.5 ul input volume and 1,1 copy/2.5 ul input volume, respectively. All 142 samples were WGA-amplified separately and pooled after amplification. The amounts of HPV155 and SE46 in the pool were 660 000 and 3,3 million copies/ul, which resulted in 156 and 22 reads in the 454 sequencing, respectively ( Table 2).
To investigate which amount of HPV16 copies per cell that would result in comparable amounts of virus after WGA amplification, we performed a titration series using the cervical cell line CaSki (containing about 500 copies HPV16/cell) diluted in HPV-negative C-33A cells. An input of 10% Caski cells was found to result in a concentration of about 165 million copies of HPV16 per microliter after whole genome amplification. 1% CaSki yielded 4.2 million copies/ul, 0.1% CaSki yielded 1,2 million copies/ul and 0.01% CaSki yielded about 220,000 copies/ul, suggesting that the 1% CaSki dilution (about 5 copies/cell) would have been detectable by the 454 sequencing.
As there appears to be a substantial variability in sensitivity, both in the WGA amplification and in the 454 sequencing step, the number of reads obtained by HTS should not be considered as a quantitation of the amount of virus present in the sample.

Characterization of HPV155
The complete genome of HPV155 (SE42) was cloned from a swab sample from the top of an AK lesion. The clone and the sequence (obtained by primer walking) were sent to the International HPV Reference Center at the German Cancer Research Center in Heidelberg, where it was established as a new HPV type and given the designation HPV155 (GenBank accession JF906559). The HPV155 comprised 7352 bp and belonged to the genus Gammapapillomavirus but demonstrated only 76% similarity to the most closely related type (HPV149) ( Table 4). In addition, sequencing of the HPV155-clone using the 454 FLX GS Junior and assembling with the Phred quality score cutoff of 30 (1 error in 1000 bases [41]) resulted in a 100% sequence identity to the sequence obtained by primer walking. To compare with the sequence quality from the original sample, we also re-assigned the 156, 1199 and 255 reads from the 454, Ion Torrent 300 bp and 400 bp kit sequencing runs, respectively related to HPV155 genome, to the sequence obtained by primer walking (Figure 1). Although these had been assembled with a Phred quality score of 15, still 145/156 reads, from 454, 1183/1199 from 300 bp Ion Torrent and 252/255 from 400 bp had high quality alignments (Figure 1). Two HPV155-related contigs from 454 and one contig from each Ion Torrent sequencing run showed 99,3% to 99,89% identities to the sequence obtained by primer walking. Table 2. Cont. Overall, HPV155 was detected among 7 of the 341 patients (Table 3). HPV155 was found in 0.6% (2/341) of both the lesions and the healthy skin biopsies. HPV155 was found in 1.1% (4/341) and 2.1% (7/341) of swab samples from top of lesion and healthy skin, respectively. Among the biopsies, the viral load of HPV155 was below 3.8610 24 copies per cell. The quantity of HPV155 in the top of lesion swab samples varied between 40 to 10251 copies/ 2.5 ul whereas the corresponding figures from healthy skin were 0.5 to 7235 copies/2.5 ul (Table 3).
SE46 was detected in 7 of the 341 patients (Table 3). SE46 was found in one biopsy (0.2%, 1/341) from the healthy skin with a viral load of 3.5610 24 copies per cell. SE46 was found in 1.5% (5/ 341) and 2.1% (7/341) of swab samples from top of lesion and healthy skin, respectively. The quantity of SE46 in the top of lesion swab samples varied between 11 to 23 copies/2.5 ul whereas the corresponding figures from healthy skin were 2.5 to 110 copies/ 2.5 ul (Table 3).
One SCC patient was positive for both HPV155 and SE46 in the biopsy of the healthy skin and in the swab samples from both tumour and normal skin, but the SCC biopsy was negative for both viruses. Eight patients with HPV155 or SE46 positive swab from top of the lesion were also positive in the corresponding swabs of healthy skin (Table 3).

Discussion
The present study investigated the presence of HPVs in a variety of skin lesions using HTS, a method that is independent of any prior knowledge of virus sequences. The majority of the viral sequences found in this study (97%) originated from different HPVs. Most of them belonged to 19 previously known types or putative types, but two previously not described novel putative HPV types, SE46 and SE47, were also found.
Previously, we used 454 sequencing only after amplification of HPV sequences by broad general primer PCR [11]. The combined approach of broad general primer PCR and 454 sequencing of amplimers was highly efficient for detection of a multitude of viruses (subgenomic sequences from 110 previously known HPV types and from 44 novel putative HPV types were found) [11]. However, when the same samples were analysed in the present study two novel putative HPV types (SE46, SE47) were detected that had escaped detection when PCR was used prior to sequencing.
Although type-specific PCR with primers directed towards known sequences is unquestionably highly sensitive, it is debatable whether sensitive detection of very low amounts of virus is detecting a relevant infection. Cleansing of the skin surface with tape will remove most of the HPV positivity without detectably altering the skin architecture [21], suggesting that many HPV DNA positivities may reflect skin surface contamination that has not established an infection. Indeed, transcriptome sequencing of a series of HPV DNA-positive skin cancers did not find viral RNA expression [50], suggesting that the viral DNA detected did not reflect an infection in the lesion.
Foulongne and collegues used a similar unbiased approach as our investigation and reported a large number of sequence reads, including nine novel HPVs, when sequencing six samples from healthy forehead skin [12]. The larger amount of viral reads found compared to our study was most likely due to the fact that these authors used the Illumina HiSeq-2000 sequencing platform that can produce considerably more reads than the 454 system [51]. However, we opted for the 454 system, because the longer read length (,450 bp in 454 system and ,100 bp in Illumina) greatly facilitates a reliable bioinformatics assembly of raw reads to contigs, in particular for de novo assembly of new viruses where a reference sequence is not available. Foulongne and collegues used only forehead swabs, known to be a rich source of HPV DNA [52]. We used both swab samples, fresh-frozen biopsies from stripped skin and formalin-fixed paraffin-embedded lesions and report that viral sequences could be obtained from all these types of skin sample [21]. Comparison with the Ion Torrent systems found that their 300 bp version produced seven times more reads than 454 and that their 400 bp version produced about twice as many reads as 454 did. Thus, there is a trade-off between sequencing depth and sequencing length and the optimal choice may depend on whether the bioinformatics step can accurately handle short reads without risk for mis-assembly.
The skin cancer pool with samples that had previously been HPV positive by PCR [29], but that was now HPV-negative in the sequencing, had largely contained very low viral loads probably below the limit of detectability of the sequencing approach. The high cost of HTS and a tendency for variability in WGA efficiency precluded a more exact definition of the limit of detectability of the 454 sequencing approach, although it appears that samples with .5 copies of virus/cell would definitely be detectable. Increasing the sequencing depth using the Ion Torrent did detect seven additional known or putative HPV types, with low amount of reads, although no additional new HPV types were identified. It is notable that our sequencing detected SE46 in a pool of 142 samples where our type specific real-time PCR found that only five patients in the pool were positive for SE46 and only in moderate copy numbers. Also, the HTS detected HPV109 in a pool of skin cancer samples that had been negative for HPV by PCR [29]. HPV109 has several primer mismatches to the HPV general primer system used and is thus amplified poorly by this PCR, further demonstrating the advantage of our PCR-free method as this will detect the most abundant viruses present without being biased by the PCR primer sequences used.
Our methods that separated viral DNA from human DNA before WGA and sequencing were less successful for detecting viral reads than directly subjecting samples to WGA and sequencing. Even though the fractions collected from the ultracentrifugation had been carefully evaluated to contain viral particles and no human DNA (using control experiments with samples spiked with HPV 16 pseudovirions and realtime PCRs for viral and human DNA [29]) only 51 viral reads were detected. Conceivably, handling of low amounts of viral DNA may result in loss of material. As specimens may contain several closely related HPV types, the possibility exists that assembly algorithms may construct erroneous ''chimeric'' sequences by the assembly of 2 different sequences from different viruses. To prevent incorrect assemblies, we processed only reads having at least 80% of base pairs with quality score of $15 (base calling accuracy of 97%). Furthermore, we used stringent assembly parameters in the CABOG assembler and also manually investigated all viral sequences for possible missassemblies by assessing consensus quality scores. Also we used python (www.python.org) scripts to check for possible ''chimeric sequences'' by comparing segments of the sequences for differential similarities against the most closely related genomes in GenBank.
Screening for the new HPV155 and putatively new SE46 among patients with SCC, AK, BCC, SK and KA revealed only a few positive patients. There were low viral loads in biopsies, but the swab samples of some patients contained high copy numbers (up to 10251 copies per 2.5 ul). Presence of virus on top of lesions, but not in the lesions themselves, is suggestive of virus production at a site different from where the biopsy was taken.
The two putative novel HPV types reported in this study (SE46 and SE47) belonged to the Gammapapillomavirus and Alphapapillomavirus genera, respectively. Gammapapillomavirus is a rapidly growing HPV genus with at least 58 completely characterized types/genomes (including HPV155) [6,[8][9][10]12] and 119 subgenomic sequences detected using the general primer PCR FAP (FA-types), augmented with HTS (SE types) [11]. Detection of new types within the genus Alphapapillomavirus is nowadays less common. Most Alphapapillomavirus types are mucosal, but Alphapapillomavirus species 2, 4 and 8 are also found in cutaneous lesions. HPV94, which is the known virus most closely related to SE47, belongs to species Alphapapillomavirus-2.
The previously described putative type FA101 had a partial 431 bp sequence deposited in GenBank. As it was abundant in the pooled formalin-fixed paraffin embedded KA samples, we now report a complete viral genome of 7359 bp of the FA101 sequence. Another useful result of the sequencing was the identification of the complete HPV155 sequence. This was used to design primers for cloning and subsequent Sanger sequencing which confirmed the HTS-obtained sequence of this virus.

Conclusions
High throughput sequencing of skin lesions was useful for an unbiased assessment of viral DNA in these lesions. Most of the viral DNA was found to represent known HPV types or known subgenomic sequences of HPV.