Multiple-Integrations of HPV16 Genome and Altered Transcription of Viral Oncogenes and Cellular Genes Are Associated with the Development of Cervical Cancer

The constitutive expression of the high-risk HPV E6 and E7 viral oncogenes is the major cause of cervical cancer. To comprehensively explore the composition of HPV16 early transcripts and their genomic annotation, cervical squamous epithelial tissues from 40 HPV16-infected patients were collected for analysis of papillomavirus oncogene transcripts (APOT). We observed different transcription patterns of HPV16 oncogenes in progression of cervical lesions to cervical cancer and identified one novel transcript. Multiple-integration events in the tissues of cervical carcinoma (CxCa) are significantly more often than those of low-grade squamous intraepithelial lesions (LSIL) and high-grade squamous intraepithelial lesions (HSIL). Moreover, most cellular genes within or near these integration sites are cancer-associated genes. Taken together, this study suggests that the multiple-integrations of HPV genome during persistent viral infection, which thereby alters the expression patterns of viral oncogenes and integration-related cellular genes, play a crucial role in progression of cervical lesions to cervix cancer.

To date, a full transcription map of oncogenic HPV16 and HPV18 in HPV-infected cells and raft tissues have been constructed [19,20].
It's well known that the integration of HPV genomes is a key event in cervical carcinogenesis [21,22]. Besides viral genome integration in activating cellular oncogenes or inactivating cellular tumor suppressive genes [23][24][25], HPV genome integration into host genome may change the transcription patterns of both viral and host genes [26]. It has been reported that the integration of HPV genomes can disrupt the viral E2 gene in cells and release its inhibition on the viral early promoter that controls the expression of E6 and E7 [27]. In addition, E6 and E7 transcripts cotranscribed with cellular sequences may be more stable, and thus enhance their expression level [28][29][30].
Transcription patterns of HPV16 in the tissues of cervical cancer have been reported [26,31]. There were an episomal HPV early gene transcript (E7-E1 ' E4) and several integrated HPV transcripts (such as E7-E1 ' cellular RNA, E7-E1 ' E4-cellular RNA, etc.) in HPV16-infected tissues. However, transcriptional selection in response to environmental changes is a dynamic process to achieve optimal gene expression for cell survival and carcinogen-esis [32]. In this study, we applied a modified technique of amplification of papillomavirus oncogene transcripts (APOT) [26] to comprehensively explore the structure and sequences of HPV16 E7 related transcripts and their genomic annotation in 8 LSIL (low-grade squamous intraepithelial lesions), 24 HSIL (high-grade squamous intraepithelial lesions), and 8 CxCa HPV16-positive cervical biopsy samples.

Patients and specimens
Tissue samples of primary uterine cervical lesions containing dysplastic epithelium/tumor cells were collected from the Second Affiliated Hospital of Wenzhou Medical University (Zhejiang Province, China) from December 2010 to April 2012. The presence of HR-HPV was detected by HCII test, and the screening of HPV16 in HR-HPV-positive samples was done by HPV genotypes detection kit (KaiPu, Guangzhou, China) [33]. All of them did not receive radiation therapy or chemotherapy before operation and each patient underwent a colposcopically directed biopsy. The collected biopsy specimens were bisected. One portion was submitted for standard histopathologic diagnosis, while the other portion was stored in RNAlater (Ambion, Austin, Texas, USA) at 280uC for subsequent analysis. On the basis of the histopathologic diagnosis, the samples were divided into LSIL (CIN I, n = 8), HSIL (CIN II, n = 22; CIN III, n = 16) and cervix carcinoma (CxCa, n = 17). Additional 8 cervical tissues with normal cytology and HPV DNA negativity as controls were obtained from the patients who underwent hysterectomy owing to benign gynecologic diseases. The study has been approved by the Medical Ethics Committee of Second Affiliated Hospital of Wenzhou Medical University. All women were informed and gave their written consent to participate in the study.

RNA and DNA Isolation from Clinical Samples
Total RNA from biopsy samples described above was isolated using TRIzol reagent (Invitrogen, Calif., USA) according to the manufacturer's instructions. To remove the residual DNA contamination, the RNA preparation was treated with Rnase-free Dnase I (Takara, Dalian, China) according to the manufacturer's protocol. Purified total RNA was dissolved in Rnase-free water and stored at 280uC. The concentration and purity of total RNA were quantified by the ultraviolet spectrophotometer at 260 nm and 280 nm and 1% agarose gel electrophoresis. Only RNA samples with an A260/A280 ratio of 1.8-2.0 and high integrity were used for the further experiment.

Reverse Transcription and PCR Amplification of Transcripts
APOT assay reported previously was based on nested PCR reactions [26], which could only amplify the abundant transcripts and ignore the transcripts with lower levels in samples. So modified APOT assay was used to amplify the HPV oncogene transcripts. The primers for these reactions were designed according to Klaes R, et al [26]. Total RNA (1 mg) was reversely transcribed using an oligo(dT) 17 -primer coupled to a linker sequence RT [34] according to the manufacturer's protocol of reverse transcriptase Kit (TOYOBO, Japan). To verify first-strand cDNA quality, PCR using glyceraldehyde-3-phosphate dehydrogenase (GAPDH) -specific primers were performed as previously described [35]. First-strand cDNA encompassing viral oncogene sequences were subsequently amplified by PCR using p1-HPV16E7 specific primer (59-CGGACAGAGCCCATTA-CAAT-39) and linker p0 (59-GACTCGAGTCGACATCG-39) as the reverse primer; and the PCR amplification was carried out in a reaction volume of 50 ml. Different from previous reports, the PCR cycles was increased to 35, and all specimens only performed one-round PCR reaction. To verify the specificity of this procedure, the ''minus-RT'' control in which reverse transcriptase was omitted from the reactions was also performed parallel.  Type A shows E1 sequences spliced directly to cellular flanking sequence; Type B shows E1 spliced to E2, with E2 fused with a cellular sequence; Type C shows E1 spliced to E4, with E4 running into a cellular sequence. m , there are two integration sites in E1 (data shown in Figure S3). The boxes within slashes represent six nucleotides between E7 and E1gene. doi:10.1371/journal.pone.0097588.g002 The APOT amplification products were visualized by 2.5% agarose gel electrophoresis. PCR products of interest were excised from the gel and extracted using DNA agarose gel recovery kit (TianGen, Beijing, China). The corresponding amplimeres were cloned into cloning vector (TransGen, Beijing, China) and DNA sequence analysis was executed using an ABI 3730 XL Genetic Analyser (Applied Biosystems, USA) according to standard protocols. Sequencing results were analyzed using the BLASTn program provided by the National Center for Biotechnology Information, USA. Additionally, the chromosomal integration sites were ascertained using the National Center for Biotechnology Information (BLAST) and European Molecular Biology Laboratory (EBI). Moreover, the fragile sites and genes of integration sites were defined using the NCBI fragile site map viewer and the UCSC Blat tool.

Specificity of APOT assay for HPV16 oncogene transcripts
The principle of the APOT assay is a 39 rapid amplification of cDNA ends (RACE) PCR assay that achieves amplification and cloning of the region between a single short sequence in a cDNA molecule and its unknown 39-end [34]. In general, the integrated transcripts derived from E6 and E7 oncogenes encompass viral sequences at their 59-ends and host genome sequences at their 39ends [26]. The expected size of products obtained from an episome-derived transcript is 1050 bp [26] Amplimers that displayed a size different from 1050 bp may therefore be derived from an integrated HPV genome. To testify the specificity of the modified APOT assay, cDNAs from HPV16-positive Caski cell contains the integrated HPV16 genome and HPV-negative normal cervical tissues, as well as the ''minus-RT'' controls in which reverse transcriptase was omitted from the reactions were used. The amplified products of the cDNAs from HPV16-positive Caski cell were similar to the previous report [26], whereas no RT product was obtained from the normal cervical tissues without HPV DNA and the ''minus-RT'' control ( Figure 1). These data indicated the modified APOT assay can specifically detect the transcripts derived from the integrated HPV genome.

Characteristics of HPV16 oncogene transcripts in the tissues of cervical intraepithelial neoplasia and cervix carcinoma
To analyze the HPV16 oncogene transcripts, 40 HPV16positive cervical specimens (LSIL, n = 8; HSIL, n = 24; CxCa, n = 8) with good quality RNA were selected among 63 collected samples in this study. Total 133 transcripts containing viral fragments were found. Among these transcripts, 64 fragments had HPV16 E7-E1* sequences at their 59-ends and directly connected with poly A at their 39-ends ( Figure S1). Furthermore, there were four different disruptions of E1 region at nt 880, 949, 1054 and 1234 ( Figure S2). The transcripts containing an E1-splice donor signal at nt 880 [36] might belong to potential episomal pattern, whereas the transcript which truncated at nt949 might be a result of internal priming by oligo dT [37]. Other transcripts which truncated at nt 1054 and 1234 neither contained poly A sequences, nor any polyadenylation site belong to viral or host, so these transcripts were viewed as potential integrated patterns. In addition, we also found another transcript which has E7 ORF spliced at nt 880 to the E4-splice acceptor site at nt 3358 and then spliced from the E4-splice donor signal at nt 3632 to the L1-splice acceptor site at 5639, and also terminated at poly A ( Figure S1). In this transcript, the E4 ORF is not disrupted. Lack of a splice donor signal at nt 5815 in this transcript indicates that the HPV16 genome disrupted at nt 5815 might also take part in the virus genome integration.
In addition, there were 64 viral transcripts directly connected to host genome sequences and they were all began with the beginning of the forward primer (p1) at nt 729. These HPV16 oncogene integrated transcripts could be divided three different types ( Figure 2). Among these transcripts, Type A has HPV16 E7-E1 * sequences at their 59-ends and directly connected to host genome sequences. However, there were two different integration sites of E1 region (at nt880 and nt1107) in this type ( Figure S3). The site at nt880 contained an E1-splice donor signal while the site truncated at nt1107 might be more likely to linearize the viral circular genome for integration into the host genome. Transcript type B has an E2 ORF disrupted at nt2870 and the Type B sequence composes of HPV16 E7-E1 ' E2 * at its 59-ends and the host genome sequence at its 39-ends. In transcript type C, the E1 ' E4 stop codon is disrupted for virus integration and an entire E1 ' E4 ORF without a stop codon is fused in frame to host sequence. Among these three patterns, transcripts of Type A and C had been reported by Wentzensen N, et al. [31]. However, transcript of Type B had not previously been reported in precancerous lesions and cervical cancer.
Moreover, HPV16 oncogenes showed significantly different transcription patterns in the tissues of LSIL, HSIL and CxCa (Figure 3, 4 and Table 1). Among these 3 transcription patterns detected in our patients, the Type A and Type B were higher prevalence than Type C, which were observed in almost all pathological types, whereas the Type C was detected only in the samples of CxCa, with a detection frequency of 75% (Table 1 and Figure 4). All patient samples displayed the Type A, but all CxCa samples had the Type B and Type C (Figure 4). Consistent with the presumption of potential integration of the viral genome in the later stages of cancer development [38,39], the prevalence of fusion transcripts were higher in HSIL and CxCa than LSIL.

Integration sites and characterization of the cellular flanking sequence
To identify the individual chromosomal locations, all 64 fusiontranscripts containing viral and cellular sequences were further analyzed by BLASTn comparisons to the whole genome database. Our data show that all chromosomes, except for Chr21 and X, were integrated with HPV16 genome, confirming the previous reports that no preferential HPV integration site was seen in selection of the human chromosome [40]. Some loci, such as 1p36.22, 1p36, 2p24, 2q33, 5q31.1, 5q31, 6p24, 8p23, 10q22.1, 13q22.1, 19q13 and 19p13.3, were reported previously [31,[41][42][43][44][45] ( Table 2). Among these integration events, fourteen of 40 samples exhibited multiple integration sites (Table 2). Although local DNA rearrangements could happen frequently and rapidly after the integration [43], we found that cellular flanking sequences in 11 tissues were mapped to different chromosomes, indicating the presence of multiple independent integrations in these samples. Moreover, we found that multiple-integration events were significantly higher in CxCa tissues (75%) than in the cervical tissues of LSIL (50%) and HSIL (53.8%). Screening of all integration loci indicates that 35 of 63 mapped integration sites were located in or close to a fragile site with a distance of 26 bp to 5 Mbp (Table 2). Among the 22 mapped fragile sites, FRA13A was found in 4 independent samples. Twenty-two transcripts were not associated with any fragile site.
The cellular flanking sequences of viral-cellular fusion transcripts were further examined for known genes. Most of these fused transcripts had a cellular sequence from the coding orientation of known genes and thirty transcripts had the cellular sequence from an intron region, and 8 transcripts were fused with a sense exon sequence of the predicted genes ( Table 2). Among these predicted genes, AMICA1, DAPK1, EBAG9, PIBF1 were  affected twice, MRPS31 four times and PRDX5 even six times by the viral integration. At the same time, the nearest host genes to each integration site in the direction of transcription were also analyzed ( Table 2). Among these predicted genes integrated or closed to the integration site, we identified several tumorassociated genes, including PRDX5, CD28, ROCK2, RHOH, TIMP3 and DAPK1, etc. As shown in Table 1, the transcripts type D and E were only detected in CxCa and most of their integration loci were located in or close to the fragile sites of FRA13C, FRA22B, FRA2I and FRA13A. The genes associated with the transcripts type D and E were oncogenes (CD28 and EBAG9), tumor suppressor genes (TIMP3), or tumor-related genes (PIBF1and MRPS31).

Discussion
Integration of HPV genome into host chromosomes represents an early clonal event to provide an additional selective advantage for the expansion of the neoplasm. Viral transcripts have been detected by the APOT assay [26,31,[42][43][44][45]. Although APOT assay has some advantages in detection transcripts from each chromosome integration site, there are several limitations. First, it is difficult to amplify very long integration-derived transcripts, which will underestimate the number of tumors with integrated HPV DNA [45]. Second, APOT is one type of nested PCR, which may tend to amplify the transcripts with higher levels and ignore those with lower levels. Third, It has been reported that the internal poly A priming could replace the oligo(dT) primer within certain limits, and generating a set of anchored oligo(dT) primers for cDNA synthesis. These sequences caused by internal priming interrupted the generating of full-length cDNA and confused the analysis of alternative splicing [37]. With our modified APOT assay to detect the transcription pattern of the cervical tissues, we did find many viral transcripts connected with poly A or host genome sequences in HPV16-infected cervical squamous epithelial tissues. We noticed that there were a lot of viral transcripts directly ended with poly A at their 39-ends. Except for the reported E1splice donor signal site (nt 880), the truncation sites at nt1054, 1234 and 5815 neither contained internal poly A sequences nor any polyadenylation signals should be potential novel integrated sites and need for further analysis. The viral-cellular fusion transcript of type A and C has been reported previously [26,31,41]. In the Type C transcript, the integration disruption of E4 termination codon would result in the E4 to use a host termination codon. In this study, we also noticed that some cervical cancer samples contained all three types of transcripts were viral-cellular fusion transcripts.
HPV16 transcription patterns in LSIL, HSIL, and CxCa were significantly different. We found that the Type C transcript was only detected in the samples of CxCa and more random integration sites existed in our tissue samples. Similar to previous reports [31,[38][39][40][41][42]46,47], our study indicates that HPV integration has no preferential site in the human genome. Except for chromosome 21 and X, other chromosomes are all susceptible to HPV16 integration. Approximately 55% integrations are located in or close to a fragile site. Different from previous reports [42,45], we noticed that integration events often occur multiple times significantly more in cervical cancer than in LSIL and HSIL. These data not only provide biological support to the epidemiologic observation that persistent infection by specific types of HR-HPV is the important cause of cervical carcinoma [1], but also indicate that subsequent selection for and accumulation of mutations in yet-to-be-identified key cellular regulatory genes promotes further progression to cervical cancer. *''-'', no entry of fragile sites or nearest genes; n.a., not applicable because fusion transcript is in antisense orientation.
*Genes highly relevant to cervix cancer which located in integration sites are indicated in italics, and genes indicated by an underline shows they are related to tumor. doi:10.1371/journal.pone.0097588.t002 The integration not only changes the transcription pattern relevant for the dysregulated expression of the viral oncogenes, but also affects the expression of the host gene with virus genome integration. The integration alters the expression of host genes in integration sites, even if this occurs within the intron sequences [43,45]. In our study, we identified a broad spectrum of cancerassociated genes in the integration sites and flanking sequence regions. Most of genes in the integration sites were associated with tumor development, and nineteen genes were strongly related to cervical cancer. Some of them act as tumor suppressors (such as, miR-34a, MSH2, WWOX and TIMP3, et al) or oncogenes (such as, ROCK2, CD28, EBAG9 and ANGPT1, et al). Interestingly, most of them were not reported in previous documents [31,45]. MiR-34a, an important tumor suppressor, is down-regulated in cervical cancer [48,49]. It has been reported that oncoprotein E6 of HPV16 and HPV18 can inhibit the expression of tumorsuppressive miR-34a by destabilization of p53 and resulted in cell proliferation [50]. The disruption of miR-34a gene might further interpret the phenomenon of reduced expression of miR-34a in cervical cancer. MSH2 is a DNA mismatch repair protein, and associated with DNA repair pathway [51,52]. Decreased expression of MSH2 might be a risk factor in the early stage cervical cancer [53]. ROCK2, an important signaling molecule, can promote cervical cancer metastasis by upregulating and activating the expression and function of moesin protein through RhoA/ ROCK2 pathway [54]. Besides the cancer-associated genes, the genes in integration sites and flanking sequence regions might be also beneficial for viral genome integration. FANCM which is a DNA translocase and highly related to DNA replication regulates checkpoint signaling and replication fork progression [55,56]. Other genes, such as COX6B1 is related to cell apoptosis [57] and ESRRA also have been reported associated with cervical cancer [58]. In addition, among 45 integration events, 13 events led to antisense transcription of the coding sequences, such as PRDX5, EBAG9 and CD28, etc. These integrations were generally deemed of no interest. However, their sense sequences were associated with DNA restoration or tumor development and might affect both host and viral gene expression during the development of cervical cancer. The most integration in the antisense orientation was the gene encoding peroxiredoxin 5 (PRDX5), a protective emzyme against oxidative stress [59,60]. Its altered expression due to HPV16 integration could have significant virological consequence, along with the integration into DNA repair genes, such as FANCM and MSH2. Upregulation of EBAG9 expression has been observed in several malignant tumors [61]. The synergistic stimulation factor of CD28 which maintains immune homeostasis plays a role in increasing susceptibility to cervical cancer [47].
In conclusion, changes of the transcription patterns of HPV 16 early genes go along with the progression from cervical intraepithelial neoplasia to cervix carcinoma and viral genome integration into host chromosome. The change or selection of transcription patterns and the integration on the expression of host genes in the integration sites and flanking cellular sequence regions might all take part in oncogenesis of HPV16-induced cancers. Figure S1 The types of viral sequences connected with poly A at their 39-ends. The type of Class I shows E1 sequences directly ended with poly A; the type of Class II shows E1 spliced to E4 and then to L1 and also ended with poly A sequences. m , there are several truncation sites in E1 (data shown in Figure S2