E6 and E7 Gene Polymorphisms in Human Papillomavirus Types-58 and 33 Identified in Southwest China

Cancer of the cervix is associated with infection by certain types of human papillomavirus (HPV). The gene variants differ in immune responses and oncogenic potential. The E6 and E7 proteins encoded by high-risk HPV play a key role in cellular transformation. HPV-33 and HPV-58 types are highly prevalent among Chinese women. To study the gene intratypic variations, polymorphisms and positive selections of HPV-33 and HPV-58 E6/E7 in southwest China, HPV-33 (E6, E7: n = 216) and HPV-58 (E6, E7: n = 405) E6 and E7 genes were sequenced and compared to others submitted to GenBank. Phylogenetic trees were constructed by Maximum-likelihood and the Kimura 2-parameters methods by MEGA 6 (Molecular Evolutionary Genetics Analysis version 6.0). The diversity of secondary structure was analyzed by PSIPred software. The selection pressures acting on the E6/E7 genes were estimated by PAML 4.8 (Phylogenetic Analyses by Maximun Likelihood version4.8) software. The positive sites of HPV-33 and HPV-58 E6/E7 were contrasted by ClustalX 2.1. Among 216 HPV-33 E6 sequences, 8 single nucleotide mutations were observed with 6/8 non-synonymous and 2/8 synonymous mutations. The 216 HPV-33 E7 sequences showed 3 single nucleotide mutations that were non-synonymous. The 405 HPV-58 E6 sequences revealed 8 single nucleotide mutations with 4/8 non-synonymous and 4/8 synonymous mutations. Among 405 HPV-58 E7 sequences, 13 single nucleotide mutations were observed with 10/13 non-synonymous mutations and 3/13 synonymous mutations. The selective pressure analysis showed that all HPV-33 and 4/6 HPV-58 E6/E7 major non-synonymous mutations were sites of positive selection. All variations were observed in sites belonging to major histocompatibility complex and/or B-cell predicted epitopes. K93N and R145 (I/N) were observed in both HPV-33 and HPV-58 E6.


Introduction
Cervical cancer is the third most common cancer in women worldwide, and a persistent infection of the high-risk human papillomavirus (HPV) is a major risk factor for cervical cancer a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Study population and specimen collection
Between January 1, 2009 and December 31, 2015, 16793 (age range 16-87, average age 33.02, median age 29) cervical specimens were collected from women who underwent cervical screenings, histology and cytology evaluations for cervical diseases at Sichuan Reproductive Health Research Center Affiliated Hospital, The Angel Women's and Children's Hospital, The Chengdu Western Hospital Maternity Unit, The People's Hospital of Pengzhou, and Chongqing The Fourth Hospital. Women over 14 years of age and with visible cervical lesions and/or HPV-related diseases (e.g., cervicitis, cervical intraepithelial neoplasia, and cervical cancer) were eligible for inclusion [20]. Specimens were collected from the participants and stored in a preservative buffer (NaCl 9g, C 6 H 5 CO 2 Na 10g, H₂O 1L) at -20˚C.

PCR amplification and identification of variants
The complete E6/E7 genes of HPV-33 were first amplified by polymerase chain reaction (PCR) in the thermal cycler (Longgene, Hangzhou, China) using the primers: 5'-AAAAAAGTAGG GTGTAACCGA-3' and 5'-TGCCACTGTCATCTGCTGT-3'; this step was followed by a second round of amplification using the inner primers: 5'-ACGGTGCATATATAAAGCAAACA TT-3' and 5'-CTTCTACCTCAAACCAACC-AGTACA-3', when needed. The HPV-58 E6/E7 fragment was amplified using specific primers described previously [24]. Each 50 μL PCR reaction mixture contained 5 μL of extracted DNA (10-100 ng), 200 μmoL MgCl2 and dNTPs, 2 U of Pfu DNA polymerase (Sangon Biotech, Shanghai, China), and 0.25 μmoL of each primer. The cycling conditions employed were as follows: 95˚C for 10 min; 35 cycles of 94˚C for 50 sec, 54˚C (different for each gene) for 60 sec, 72˚C for 60 sec; and a final step of 72˚C for 7 min. The PCR amplification products were visualized on 2% agarose gels stained with Gene-Green nucleic acid dye under the ultraviolet light WFH-202. Target products were sequenced at Sangon Biotech (Shanghai, China), and all the data were confirmed by repeating the PCR amplification and sequence analysis at least twice.

Sequence analysis
The secondary structures of the reference proteins were predicted by PSIPred server (http:// bioinf.cs.ucl.ac.uk/psipred/) with the default parameters, which provided a simple and accurate secondary structure prediction method. Using a very stringent cross validation technique to evaluate the method's performance, PSIPRED 3.2 achieves an average Q3 score of 81.6% [25]. The data were analyzed using SPSS version 19 (IBM, Armonk, NY, USA). The Pearson χ2 test was used to confirm the results. P < 0.05 was considered statistically significant. A mutation of which the frequency ! 10% was considered as a major mutation.

Results
Among all the HPV-58 and HPV-33 samples, 405 sequences of HPV-58 E6/E7 gene and 216 sequences of HPV-33 E6/E7 gene were obtained owing to the small number of copies of infected HPV in some women and limited amplicons obtained for sequencing.
Altogether, 11 nucleotide positions of E6/E7 fragment showing sequence polymorphisms were identified among the 216 isolates. 8 biallelic mutations occurred in the 450-bp E6 Open Reading Frame (ORF); more specifically, 6/8 were non-synonymous substitutions and 2/8 were synonymous mutations. In addition, two biallelic mutations and one triallelic (C706A/T) mutation were found over the 294-bp E7 ORF, resulting in 4 amino acid changes of S29T, A45V, A45E, and Q97L ( Table 1). The sequence variability of E7 was lower than that of E6. The average probability of a nucleotide sequence deviation from the prototype was 20.37 substitutions per 10000 bp for E6 and 15.28 substitutions per 10000 bp for E7. Mutations generating a frame shift or a premature stop codon were not observed.
Two non-synonymous mutations (S74T and Q113R) occurred in the HPV-33 E6 sequences encoding the alpha helix. No non-synonymous mutation was detected in the HPV-33 E7 sequences encoding the alpha helix or the beta sheet.
Among the 216 determinable samples of HPV-33, the A1 and A3 variants were found in 199 (92.13%) and 17 (7.87%) samples, respectively (Table 1 and Fig 1). All of the variants belonged to prototype-like groups (lineage A); whereas no nonprototype-like variant (lineage B) of HPV-33 was observed in our study (Fig 1).

Gene polymorphism of HPV-58 E6/E7
When compared with the HPV-58 reference sequences (GenBank: D90400), 11.85% (48/405) of the isolates showed complete E6/E7 sequence homology with the reference, and the nucleotide variation rate of HPV-58 E6/E7 was 88.15% (357/405) among the 405 HPV-58 isolates. Overall, 8 nucleotide sequence variations were detected in the 450-bp E6 ORF; specifically, 4 mutations were non-synonymous and 4 mutations were synonymous. 13 single nucleotide changes were identified in the 297-bp E7 ORF, out of which 10 substitutions were non-synonymous and 3 substitutions were synonymous mutations ( Table 2). Sequence variability was higher for E7 than for E6. The average probability of a nucleotide sequence deviation from  2  2  3  3  3  4  5  5  6  7  8   1  7  2  6  8  4  4  4  5  0  6   3  3  9  4  7  6  2  9  8  6  2 M12732 The nucleotides conserved with respect to the reference sequence were marked with a dash (-), whereas a variation position was indicated by a letter. The "S" in the last row of the table means Sheet, the "H" means Helix. "n" represents the number of each sequence pattern among the 216 samples examined. There were no mutations generating a frame shift or a premature stop codon. In addition, there were no non-synonymous mutations in the HPV-58 E6 sequences encoding the alpha helix or the beta sheet; while three non-synonymous mutations (T74A, D76E, and V77A) occurred in HPV-58 E7 sequences encoding the alpha helix.
The nucleotides conserved with respect to the reference sequence were marked with a dash (-), whereas a variation position was indicated by a letter. The "S" in the last row of the table means Sheet, the "H" means Helix. "n" represents the number of each sequence pattern among the 405 samples examined.

Selective pressure analysis and homology comparison
The variable dN/dS ratios were tested among various lineages using the PAML 4.8 software.

Predicted MHC and B-cell epitopes
The results of major histocompatibility complex (MHC) epitopes (only epitopes binding not less than 10 HLA class alleles were shown) and B-cell epitopes (only score ! 0.85 were shown), as predicted, were summarized in Fig 3, the details were shown in S3-S5 Tables. For MHC I, 8 good epitopes were obtained for HPV-33 E6, 5 for HPV-33 E7, 11 for HPV-58 E6 and 5 for HPV-58 E7, respectively; For MHC II, 6 good epitopes were obtained for HPV-33 E6, 6 for HPV-33 E7, 9 for HPV-58 E6 and 6 for HPV-58 E7, respectively; For B-cell, 1 high score epitope was obtained for HPV-33 E6, 3 for HPV-33 E7, 2 for HPV-58 E6 and 3 for HPV-58 E7, respectively. Amino acids change may influence the epitope's binding ability, decrease or increase the binding ability, or even lead to disappearance of epitopes or appearance of new epitopes (S3-S5 Tables).

Discussion
HPV types are divided into different genus according to their biological characteristics. HPV-33 and 58 are known to be closely related to each other, and belong to the α-9 species, which is the principal species consisting of almost all carcinogenic types [44]. The intratypic variations observed in E6 and E7 can offer useful information for the distinction and identification of known or new HPV types [45,46]. The present study revealed higher frequencies of HPV-58 E6/E7 variations than HPV-33 E6/E7. Furthermore, HPV-33 E7 was observed to be steadier than E6; therefore, E7 was selected as a more suitable target for diagnostic detection of HPV-  Tables 3 and 4: ln L, the log-likelihood difference between the two models; 2Δl, twice the log-likelihood difference between the two models; the positively selected sites were identified with posterior probability ! 0.9 using Bayes empirical Bayes (BEB) approach, an asterisk indicates posterior probability ! 0.95, and two asterisks indicate posterior probability ! 0.99. NA means not allowed. NS means the sites under positive selection, but not reaching the significance level of 0.9. doi:10.1371/journal.pone.0171140.t004 E6 and E7 Gene Polymorphisms in Human Papillomavirus Types-58 and 33 33 than E6. On the contrary, HPV-58 E6 was steadier than E7 (χ2 = 16.015, P < 0.01); and was chosen as a more appropriate target for diagnostic detection of HPV-58 than E7. Previous studies revealed two main groups-prototype-like group (lineage A) and nonprototype-like group (lineage B) among HPV-33 variants [27,29]. However, all of the HPV-33 variants belonged to the prototype-like group, and the nonprototype-like group was not observed in the determinable samples of our study; these findings were similar to the results obtained by Chen et al and Godinez et al [29,30]. The A1 sub-lineage (92.13%) was determined as the major type in southwest China. In a worldwide study on E6/E7 segment of 213 HPV-33 samples, the A2 sub-lineage (59.72%) was the main type in the Asia and Oceania region [29]. However, no A2 sub-lineage was detected in this study. 7.87% of the sub-lineage belonged to A3.
In the current research, the prototype-like group of HPV-58 was the dominating variant; this classification of lineages was in agreement with the results of previous studies [36,47]. Nevertheless, the distribution of HPV-58 variant lineages around China was observed to be different in our study; B1 variant was a nonprototype variant, while the C variant was a nonprototype variant lineage in Hong Kong and Tai Wan [36,47]. When compared with a previous report on HPV-58 in southwest China [33], B lineages were newly found.
To the best of our knowledge, the present study is the first to contrast the HPV-33 and HPV-58 E6/E7 sites under positive selection. The key characteristic of positive selection is that it causes an unusually rapid rise in allele frequency, under positive selection, the positive mutation(s) may rise to high frequency rapidly, help species to adapt to the environments [48]. The selective pressure analysis showed that all the sites that evolved under positive selection were common non-synonymous mutations, indicating that the positively selected variations beneficial for HPV-33 and HPV-58 to accommodate their environments are wide-spread. HPV-33 falls next to HPV-58 in the phylogenetic tree [44]; and remarkably, the positive sites K93N and R145 (I/N) were observed in both HPV-33 and HPV-58 E6, they may have evolutionary significance in making HPV-33 and HPV-58 adaptive to their environments. The HPV-58 E6 mutations T20I and G63S have been reported to increase the risk of developing cervical cancer [36]; interestingly, in the present study, we found that these two mutations were positively selected. Specific intratypic HPV genome variations may be related to virus infectivity, pathogenicity, progression to cervical cancer, viral particle assembly, and host immune response. Of all these variations in the present study, the five newly-reported mutations have been only found in southwest China until now: G329C (S74T) and G542T (R145I, a positive variation) for HPV-33 E6, D658C (S29T, a positive variation) for HPV-33 E7 as well as A259G and T395C for HPV-58 E6 [24,29,[34][35][36]47,[49][50][51].
Amino acid positions 145-149 form the PDZ binding domain in the E6 protein; amino acid positions 21-29 form short linear motif responsible for Rb binding in E7 protein; whereas the positions 58, 61, 91, and 94 act as Zn binding sites in the E7 protein [29,52]. In the present study, G542T (R145I, a positive mutation) in HPV-33 E6 and G543A (R145K, a positive mutation) were found at residues 145-149; G658 (S29, a positive mutation) in HPV-33 E7 was found at residues 21-29, G632T (T20I, a positive mutation) in HPV-58 E7 was found beside residues 21-29; C755A (T61N) in HPV-58 E7 was found at residues involved in Zn binding for E7 protein. Until now, there is no published data to prove that immunity to one variant can prevent another variant in HPV infections [53].
Modern immunoinformatics provide new strategies for the design and identification of antigen-specific epitopic sites that could be used as vaccine targets. The prediction of MHC and Bcell epitope in this study can be potentially used for the vaccine development against specific HPV variants in the Chinese population. In the present study, variations other than HPV-33 E6 K93N, HPV-33 E7 Q97L and HPV-58 E7 T20I were observed in sites belonging to ideal B-cell and/or MHC predicted epitopes. Some variations, like HPV-58 E6 R145K, occurred at the sites belonging to both B-cell and MHC epitopes; Amino acids change may influence the epitopes, and then the immune recognition of HPV-infected cells. For example, the score of HPV-58 E6 B-cell epitope 81-96YSLYGDTLEQTLKKCL was 0.90, because of the mutation D86E, the score of epitope 81-96YSLYGETLEQTLNKCL decreased to 0.86; the HPV-58 E7 predicted epitope 77-85VRTLQQLLM disappeared because of the mutation V77A. However, further experimentation is required to validate the prediction through immunoinformatics. This is the first study to examine the changes in E6/E7 epitopes of HPV-58 variants in southwest China and to report the variants of HPV-33 E6/E7 in China. The data presented in this study may have significant implications in understanding the intrinsic geographical relatedness and biological differences between HPV-33 and HPV-58 E6/E7, and may also contribute to the design of clinical diagnostic probes and second-generation therapeutic vaccines based on HPV-33 and HPV-58 E6/E7. Supporting Information S1