Genome-wide host methylation profiling of anal and cervical carcinoma

HPV infection results in changes in host gene methylation which, in turn, are thought to contribute to the neoplastic progression of HPV-associated cancers. The objective of this study was to identify joint and disease-specific genome-wide methylation changes in anal and cervical cancer as well as changes in high-grade pre-neoplastic lesions. Formalin-fixed paraffin-embedded (FFPE) anal tissues (n = 143; 99% HPV+) and fresh frozen cervical tissues (n = 28; 100% HPV+) underwent microdissection, DNA extraction, HPV genotyping, bisulfite modification, DNA restoration (FFPE) and analysis by the Illumina HumanMethylation450 Array. Differentially methylated regions (DMR; t test q<0.01, 3 consecutive significant CpG probes and mean Δβ methylation value>0.3) were compared between normal and cancer specimens in partial least squares (PLS) models and then used to classify anal or cervical intraepithelial neoplasia-3 (AIN3/CIN3). In AC, an 84-gene PLS signature (355 significant probes) differentiated normal anal mucosa (NM; n = 9) from AC (n = 121) while a 36-gene PLS signature (173 significant probes) differentiated normal cervical epithelium (n = 10) from CC (n = 9). The CC progression signature was validated using three independent publicly available datasets (n = 424 cases). The AC and CC progression PLS signatures were interchangeable in segregating normal, AIN3/CIN3 and AC and CC and were found to include 17 common overlapping hypermethylated genes. Moreover, these signatures segregated AIN3/CIN3 lesions similarly into cancer-like and normal-like categories. Distinct methylation changes occur across the genome during the progression of AC and CC with overall similar profiles and add to the evidence suggesting that HPV-driven oncogenesis may result in similar non-random methylomic events. Our findings may lead to identification of potential epigenetic drivers of HPV-associated cancers and also, of potential markers to identify higher risk pre-cancerous lesions.


Introduction
In the United States, approximately, 43,999 HPV-associated cancers are diagnosed annually and include malignancies of the oropharynx, anus, cervix, vulva, vagina, and penis [1,2]. Cancers of the oropharynx, anus and vulva are among the handful of malignancies for which there continues to be a steady increase in incidence. For example, the incidence of anal cancer (AC) has jumped from 0.8 cases per 100,000 in 1975 to 1.8 cases per 100,000 per year based on 2010-2014 data [3]. Despite the availability of an HPV-targeted vaccine, these diseases continue to be a substantial burden to both US and world-wide populations due to low uptake and lag time of preventive effect [4].
The progression of normal epithelium through cervical intraepithelial neoplasia (CIN1 to CIN3), to cervical cancer (CC) is well described. Although approximately 30% of patients with untreated CIN3 will go on to develop invasive cancer, the vast majority do not [5]. AC is also known to develop through intraepithelial neoplasia (i.e., AIN1-3) and similar to CIN2/3 in cervical cancer, the majority of AIN3 lesions still do not progress to cancer. Given that these high grade anal and cervical lesions are routinely managed by surgical excision/ablation, a substantial number of patients are inherently overtreated. Consequently, the development of biomarkers for more selective treatment of high-risk premalignant lesions would be of significant clinical value [6].
DNA methylation is a key aberrant epigenetic event that has been documented in virtually every tumor type studied and is amongst the earliest disease-associated changes observed during tumorigenesis [7]. HPV may influence the host transcriptome through a number of epigenetic mechanisms including HPV E7 oncoprotein-mediated alterations in DNA methyltransferases [8,9]. There is growing evidence to suggest that HPV-associated oncogenesis in different organ sites may be associated with common non-random genome-wide methylation events [10].
We and others have observed differential methylation patterns across the spectrum of anal squamous neoplastic progression including normal tissue, pre-cancerous lesions, and anal carcinoma [10][11][12]. Similarly, differential methylation of various genes is detected when comparing normal cervical tissue to CC, as well as to cytologically identified high-grade and lowgrade intraepithelial lesions (HSIL, LSIL) [13,14]. From a series of studies from the University of Amsterdam [15][16][17][18], using panels of 6 to 12 selected methylation markers derived from a methylation signature of cervical neoplasia, it has been suggested that both high-grade cervical and anal lesions may represent heterogeneous entities that harbor lower and higher risks for cancer progression [19]. In this study, exclusively using genome-wide methylation analyses, we sought to identify comprehensive methylomic profiles that differentiated anal or cervical cancer from normal mucosa, both disease specific and shared, and examined whether signatures derived from our methodologic approach were able to identify potential higher and lower risk high-grade anal and cervical lesions.

Anal and cervical tissues
Formalin-fixed paraffin-embedded (FFPE) anal tissues were identified from the NRG Oncology/ RTOG 98-11 biorepository at the UCSF Medical Center-Mount Zion. The 98-11 trial evaluated combinations of external beam irradiation plus chemotherapy in a phase III randomized controlled treatment trial of anal squamous cell carcinoma (SCC; concurrent 5-Fluorouracil (FU)/mitomycin-C vs. 5-FU/cisplatin) in patients enrolled between October 1998 and June 2005 [20]. Patients with a primary diagnosis of T1 or M1, severe comorbid conditions (including AIDS), or prior malignancy within the last 5 years were excluded [20].
For this study, archived FFPE tissue sections were obtained from patients in the Mitomycin-C arm of the trial. All sections were reviewed by two pathologists and regions of invasive AC, AIN3, and normal/benign anal mucosa were identified and microdissected. AIN3 and normal mucosa were identified in association with adjacent invasive ACs; however, AIN3 and normal tissues were frequently derived from separate paraffin blocks. In cases in which normal or AIN3 were on the same section, the tissues were clearly spatially delineated with confirmation by 2 independent pathologists and meticulously microdissected.
Under IRB-approved protocols, fresh frozen cervical tissues including normal cervix, CIN3, and invasive CCs were identified from the Moffitt Cancer Center Total Cancer Care Biorepository. Histology was confirmed by expert pathologists and tissues were microdissected. Ten benign/normal tissues, 9 CIN3 and 9 invasive CCs were obtained for this study. The 28 tissues were derived from 26 patients, of which 2 represented tumor-normal pairs. In some cases, CIN3 was identified adjacent to CCs.

DNA processing
Genomic DNA from anal and cervical tissues was isolated using the QIAamp DNA extraction kit (QIAGEN, Valencia, CA). DNA concentrations were measured with picogreen-based Qubit1 dsDNA HS Assay Kit (Invitrogen Cat # Q32851). Genomic DNA (500 ng) from cervical tissues underwent sodium bisulfite modification using the EZ DNA Methylation kit (Zymo Research, Orange, CA).
For anal tissues, DNA methylation quality of samples with sufficient amount (�250ng) was assessed using the Illumina FFPE QC quantitative RT-PCR kit (Illumina, San Diego, CA) on the Applied Biosystems 7900HT platform. Eligible samples underwent sodium bisulfite modification using the EZ DNA Methylation kit followed by ligation using the Infinium HD FFPE DNA Restore kit (Illumina, San Diego, CA) as previously described [21].

Methylation array
Genome-wide methylation was interrogated using the Infinium HumanMethylation450K BeadChip (HM450) following the manufacturer's specifications which included wholegenome amplification, fragmentation, hybridization, base extension, counterstaining and scanning. A Tecan Liquid Handling robot with the Te-Flow apparatus was used for single base extension and staining, and chips were scanned on a single HiScanSQ System (Illumina Inc.). The HM450 incorporates both Infinium I (methylated and unmethylated beads per CpG locus) and Infinium II assays (one bead type with the methylated state determined at the single base extension step after hybridization) to evaluate the DNA methylation status at 485,512 CpG loci, which covers 99% of annotated genes and 96% of defined CpG islands [22][23][24].

HPV genotyping
HPV status of anal and cervical DNA samples were determined using INNO-LiPA HPV Genotyping extra kit (Innogenetics, Gent, Belgium) which uses SPF10 consensus primers to amplify a 65bp biotinylated fragment from the HPV L1 region. Amplicons are then denatured and hybridized with specific oligonucleotide probes.

Bioinformatics
Raw IDAT files were processed, background corrected and normalized using control probes in R using the minfi bioconductor package. β-values with a corresponding detection p-value>0.05 were set as missing and samples with >25% missing β-values were removed from the analysis. Methylation data from The Cancer Genome Atlas (TCGA) [25] for CCs were retrieved as raw IDATs and processed as described above. Gene expression data for CC and all other TCGA tumor types were downloaded from the Pan-Cancer Atlas at genomic data commons (GDC) (https://gdc.cancer.gov/about-data/publications/pancanatlas) and log2 transformed. All statistical tests were done using two-sided students t-test assuming unequal variance and false discovery corrected (q-value) as described by Storey [26]. A differentially methylated region (DMR) was defined as follows; Cervical: q-value<0.01, mean difference between groups > 0.4 and a minimum of four consecutive probes being significant within a gene. Anal: q-value<0.01, mean difference between groups > 0.3 and a minimum of three consecutive probes being significant within a gene. CpG definitions and gene models were taken from the Illumina manifest file. Partial Least Squares (PLS) using a binary response (1 = tumor, 0 = normal) was used for modeling the difference in DMR between normal and tumor samples [27]. Cross-validation was used to estimate the optimal number of PLS components [28]. The derived PLS models were also used to classify AIN3 in the anal tissue data set and CIN3 cases in the cervical tissue data set.
In addition, the CC progression PLS methylation model was applied for classification of normal, CIN3 and cervical tumors in four independent validation datasets: (a) HM450 Bead-Chip data from 307 tumor, 2 metastatic (Met) and 3 normal samples in TCGA CESC and (b) HM450 data obtained from 20 normal cervical tissues, 18 CIN3, and 6 CC samples and deposited within GEO GSE46306 [13] and (c) HM450 data from 28 normal, 36 CIN3 and for 4 tumor samples deposited in GSE99511 [18].
PCA models, PLS models, t-Distributed Stochastic Neighbor Embedding (t-SNE), methylation pattern across genes, and all statistical tests were done in MATLAB (Mathworks, Natick, MA).

Tissues and demographics
From the 186 patients with available anal tissues, 121 invasive cases, 13 adjacent AIN3 and 9 adjacent normal mucosae yielded adequate amounts of genomic DNA (>250ng); thus, a total of 143 distinct samples were evaluated by methylation array. All cervical tissues including normal (n = 10), CIN3 (n = 9) and CC (n = 9) yielded adequate genomic DNA (>500ng) for methylation array analysis. All samples passed QC and β-value histograms are shown in S1A and S1B Fig.
The AC population consisted of 74 women and 47 men with a median age of 54 years (minmax:25-79). Cervical tissues were derived from 26 women with a median age of 35 years (min-max: 22-68). Race distribution was predominantly white for both the cervical (81%) and anal (87%) groups. Patient demographics are presented in S1 Table. HPV status Of the 143 patient samples tested, 142 (99.3%) were positive for at least one or more HPV types while only one was negative for HPV. All 28 (100%) cervical specimens tested were HPV positive.

PCA of methylomic alterations
PCA was used to compare β-values for all 143 anal tissues (121 tumor, 13 AIN3 and 9 normal) across all probes (Fig 1A). A separation between the normal anal mucosae (blue circles) and the ACs (red triangles) can be seen in the second PCA component. There were no strong outliers or batch effects observed. AIN3 cases (in-situ cancer; grey squares), which were all derived in association with AC, were all closely clustered with ACs. In addition to PCA, t-SNE was used to cluster the samples and separation of normal from AC was also observed (S2A and S2B Fig).
PCA was used to compare the methylation β-values for the 28 cervical tissues (9 tumor, 9 CIN3 and 10 normal) across all probes (Fig 1B). A separation between the normal cervical tissue (blue circles) and CCs (red triangles) can be seen in the first PCA component. As with the anal tissue analyses, no strong outliers or batch-effects were observed. In contrast to the anal tissue analysis, CIN3s (grey squares), which were derived independently from CCs, segregated into those similar to normal epithelium and others more similar to cancer.

Identification of DMRs in AC and CC
Using DMR-defining criteria, the comparison of normal anal mucosae with ACs resulted in 355 significant CpG loci representing 84 significant genes (S2 Table). For the cervical analysis, 36 genes, comprised of 173 CpG loci, were identified that significantly distinguished CC from normal cervical tissue (S3 Table). There were 17 overlapping genes total between the two panels. These genes and their biological functions are listed in Table 1. From this panel, we have arbitrarily selected 2 representative genes, ZIK1 and ASCL1 (previously identified as being differentially methylated in HPV-associated cancers), for expanded analyses [13].
Zinc finger protein interacting with K protein 1 (ZIK1), a representative significant differentially methylated gene common to both anal and cervical analyses, is visualized in Fig 2, across multiple datasets. Fig 2A and 2B shows methylation levels for 15 CpG loci for each sample type in the anal (a) and cervical (b) datasets. In both anal and cervical tissues, many of the loci displayed hypermethylation in the tumor (red) samples compared to the normal (blue) samples. The AIN3 (grey) samples were hypermethylated (a) similar to the tumor samples       while the CIN3 (grey) samples for cervical had a lower degree of methylation (b). Ten of the loci located in the transcriptional start site (TSS)-1500, TSS-200 and 5'UTR regions were highly correlated to each other in both the anal and cervical datasets (Fig 2C and 2D). Fig 2E shows the correlation between the average methylation level for the 10 highly correlated loci and RNAseq gene expression level for ZIK1 in the TCGA CC dataset. The expression of ZIK1 was observed to be lower in many of the TCGA tumor types (Fig 2F). Fig 3 provides a representative plot of a genomic region within the Achaete-scute family bHLH transcription factor 1 (ASCL1) gene. Fig 3A presents box plots of the methylation β-values in normal (blue), AIN3 or CIN3 (grey) and tumors (red) across 21 CpG sites. Among the 21 CpG sites presented, 4 CpG sites fall within the overlapping DMRs that were significantly hypermethylated in both AC (Fig 3A) and CC (Fig 3B). Methylation levels in AIN3 samples were more aligned with AC samples, whereas methylation levels in CIN3 were similar to those of normal cervical tissues. Corresponding plots between all ASCL1 probes for ACs and CCs demonstrate a high degree of correlation (Fig 3C and 3D). The pattern of the few probes with lower correlation is also similar between the anal and cervical data. The correlation between methylation and gene expression levels is lower for ASCL1 and it is noteworthy that many samples show low mRNA expression of ASCL1 in CCs from TCGA (Fig 3E). Low expression is observed across most tumor types in TCGA (Fig 3F). Similar patterns of hypermethylation in both anal and cervical cancers were observed for all 17 overlapping genes ( Table 1).

Derivation of AC and CC Partial Least Squares (PLS) methylation models
PLS scoring model differentiates AC from normal anal mucosae. An AC progression PLS methylation model was derived using the selected DMRs (355 probes) and 130 samples (121 tumors, 9 normal samples) with a binary response. Cross-validation indicated that 2 PLS components were optimal. The PLS model explained 67% of the variation in X and 62% of the variation in Y (47% cross-validated). Fig 4A shows the calculated Y-values, or PLS-Score, for the normal anal and tumor samples and also the predicted Y-values for AIN3 samples. Sensitivity and specificity values for tumor vs. normal anal tissue were 0.99 and 0.78, respectively. PLS regression modeling was applied to all 143 anal tissue samples and a distinct separation was observed between the normal tissues when compared to both AIN3 and tumor samples (Fig 4A). All AIN3 samples were noted to segregate with the AC samples, which is consistent with the fact that all AIN3 were adjacent to an invasive AC.
PLS scoring model differentiates CC from normal epithelium. A PLS scoring model for CC was derived using the selected DMRs (172 probes) and 19 samples (9 tumors, 10 normal samples) with a binary response. Cross-validation indicated that one PLS component was optimal. The PLS model explained 93% of the variation in X and 95% of the variation in Y (94% cross-validated). The CC progression PLS methylation model had the ability to distinguish between normal and cancer samples with both sensitivity and specificity values being 1.0, as evidenced by the separation in calculated Y-values, or PLS-Score, for normal and CC samples (Fig 4B). When the model was applied to the 9 CIN3 samples, predicted Y-values for CIN3 were observed to segregate into separate normal-like and tumor-like cases (Fig 4B).
or CIN3 (grey boxes) and normal (blue boxes). Significantly different median methylation at each CpG loci is noted as � for p<0.05 and �� for p<0.005. Among the 15 CpG sites presented, 4 CpG sites fall within the overlapping DMRs that were significantly hypermethylated in both anal (a) and cervical (b) cancers. For ZIK1, the anal in situ (AIN3) samples showed similar methylation levels to those of tumor samples whereas for CIN3, methylation levels were similar to normal cervical tissues. Corresponding correlation plots between all ZIK1 probes for anal (c) and cervical (d) cancers show a high degree of correlation for ten of the probes. The average methylation levels for the ten correlated probes show high correlation (r = -0.7) to RNAseq gene expression levels in the TCGA CESC dataset (e

Validation of the CC progression PLS methylation model
The CC progression PLS methylation model was validated in 3 independent publicly available datasets. Firstly, we applied the model to HM450 BeadChip methylation data from 312 CC samples in The Cancer Genome Atlas (TCGA) [25]. Despite a small number of normal cervix samples, the cervical progression PLS methylation model demonstrated a robust ability to segregate normal cervix from CCs, with all but 8 cervical tumors accurately classified (false-negative rate of <3%) and all normal samples correctly classified (Fig 4C). Sensitivity and specificity values for tumor vs. normal in the TCGA dataset were 0.97 and 1.0, respectively.
Secondly, the CC progression PLS methylation model was applied to HM450 data obtained from 20 normal cervical tissues, 17 CIN3 and 6 CC samples and deposited within GEO (GSE46306) [13] The PLS model performed similarly to data obtained in this study as the predicted PLS score was able to segregate normal from CC specimens with sensitivity and specificity values of 0.5 and 1.0, respectively (Fig 4D). Furthermore, the PLS model clustered the majority of CIN3 samples with the normal cervical samples; however, three CIN3 lesions clustered with cervical tumors and may be classified as having "tumor-like" or high-risk methylation patterns. Finally, our model was applied to the HM450 from Verlaat et al [18] with 28 normal, 36 CIN3 and for 4 tumor sample (Fig 4E). The PLS showed a compressed range with the highest tumor scoring 0.6; however, there was still a clear separation between normal and tumor samples, with most of the CIN3 samples scoring as normal-like. For this (GSE99511) dataset, we obtained a sensitivity of 0.25 and specificity of 1.0 when comparing normal vs. tumor samples.
In summary, analysis of CIN3 lesions using the PLS model, effectively groups high grade cervical dysplasia into subsets of CIN3 specimens that display DNA methylation patterns similar to either normal cervical tissue or invasive CC.

Cross-application of cervical and AC progression PLS methylation models
Given that there were 17 differentially methylated genes that overlapped between the cervical and ACs, and that the Δ β-values were similar in both the anal and cervical datasets (Fig 4F), the cross-applicability of the two PLS methylation models was examined. When the AC progression PLS methylation model, which included all 355 probes, was applied to the cervical methylation data, a high correlation with the CC progression PLS model was observed that included 173 CpG probes (Fig 4G). Similarly, the converse application of the CC progression PLS methylation model to the anal methylation data yielded a similar high correlation (Fig 4H).

Discussion
HPV-associated cancers remain an important health issue in the US with the incidence of AC continuing to rise and CC and oropharyngeal cancer comprising the largest proportion of such malignancies [1]. Unlike for AC, screening and prevention guidelines for CC are well illustrate the median (dot) and interquartile ranges [25th (low boundary of box) and 75th (upper boundary of box) percentiles] of β-values in tumor (red boxes), AIN3 or CIN3 (blue boxes) and normal (green boxes). Significantly different median methylation at each CpG loci is noted as � for p<0.05 and �� for p<0.005. Among the 21 CpG sites presented, 4 CpG sites fall within the overlapping DMRs that were significantly hypermethylated in both anal (a) and cervical (b) cancers. For ASCL1, anal in situ (AIN3) showed similar methylation levels as tumors while, CIN3 methylation levels were similar to normal cervical tissues. Corresponding correlation plots between all ASCL1 probes for anal (c) and cervical (d) cancers show a high degree of correlation for all of the probes. The average methylation levels for 17 probes show low correlation (r = -0.3) to RNAseq gene expression level in the TCGA CESC dataset (e). The tumor vs. normal gene expression levels across multiple TCGA tumor types are shown (f, � p<0.05, �� p<0.01, ��� p<0.001 & ���� p<0.0001). https://doi.org/10.1371/journal.pone.0260857.g003

PLOS ONE
developed and have been successful in reducing the number of new cases [29]. However, given the high prevalence of HSIL/CIN2 and an inability to predict regression of CIN2/3 lesions, it can be presumed that a substantial number of women may be overtreated with excision or ablation.
Although there are similarities in the underlying carcinogenesis, AC screening standards remain less evolved than those for CC [6]. It has been suggested that high-risk populations such as HIV+ individuals, MSM, and immunosuppressed patients should be considered for AC screening [30]. However, the adaptation of cervical screening to the anus, such as High Resolution Anoscopy (HRA) [31] following abnormal anal cytology has limitations due to differences in anatomic site that impact the screening procedure [6,32]. These limitations were highlighted in a study of 138 HIV+ MSM diagnosed with AC; of whom over half had participated in HRA-based screening prior to diagnosis and a significant number of ACs identified were located where a preceding AIN3 was treated [33]. Similar to CC, there is also a concern regarding overtreatment by aggressive use of HRA and ablation for AIN3/HSIL, especially among high-risk populations. Although AIN3/HSIL lesions are potential precursors for malignant transformation, it is clear that the majority of such lesions actually do not progress to invasive disease [34]. The AC HSIL Outcomes Research (ANCHOR) study, that initiated patient accrual in 2014, is a randomized screening trial that may provide insight into the benefit of identification and treatment of HSIL through screening and HRI as it relates to the progression to HSIL to malignancy [35,36]. Ongoing efforts to better refine screening will be needed to supplement the findings of ANCHOR.
Molecular biomarkers for identifying HPV-associated pre-cancerous lesions at high risk for progression to malignancy would be valuable for enhanced screening, targeted prevention and to reduce overtreatment. Approaches to identify epigenetic alterations in HPV-associated cervical and anal cancer have evolved from a targeted approach of known epigenetic targets in cancer to larger panels of targeted genes. Early work focused on panels of small numbers of selected cancer specific genes, such as APC, CALCA, CNNA1, C13ORF18, DAPK1, ESR1, RARB, SLIT2 or WIF1, to differentiate normal cervix from CC [37][38][39]. Subsequent work using a high throughput qMSP-based targeted approach, the investigators from the University of Groningen examined a targeted panel of 213 cancer-specific genes and derived a 4-gene panel (JAM3, EPB41L3 and TERT and C13ORF18) that was able to discriminate CIN3 and CC more accurately than conventional cytology [40]. Lendvai et al. [41] used Methylated DNA Immunoprecipitation (MeDIP) combined with DNA microarray to identify two differentially methylated regions of COL25A1 and KATNAL2 genes as having significantly progressive methylation with increasing severity of CIN compared with normal cervical epithelium. Using the MethylCap-Seq platform for genome-wide methylation analysis, Boers et al. identified 8 new candidate methylated markers that distinguished CIN2/3 from normal cervix (ZSCAN1, ST6GALNAC5, ANKRD18CP, CDH6, GFRA1, GATA4, KCNIP4, and LHX8) [42]. When The CC progression PLS model applied to the Cervical dataset differentiates normal and the tumor samples with 3 of the CIN3 samples scoring as tumor-like and 6 as normal-like (b). The CC progression PLS model was further validated on three additional datasets. The TCGA Cervical dataset where the normal (n = 3) samples scored low, the metastatic (Met, n = 2) samples scores as tumors and all but 5 tumor (n = 307) samples scored high (c). In the GSE46306 cervical dataset, all of the normal (n = 20) samples scored as "normal" and most of the tumor (n = 6) scored as tumors while most of the CIN3 (n = 18) scored as "normal-like" with several being classified as "tumor-like" (d). Finally, in GSE99511 all normal cases (n = 28) scored appropriately while tumors (n = 4) scored higher with the majority but not all CIN3 cases (n = 36) scoring as "normal-like" (e). Density scatter plot for Δ β-values for tumor versus normal for cervical tissues on the x-axis and anal tissues on the y-axis (f). The high correlation indicates that the Δ β-values are similar when comparing the progression of both cervical and anal cancers. This was further explored by applying the AC progression PLS model to the cervical dataset and comparing it with the CC progression PLS model (g). The high correlation implies that the methylation changes are similar in cervical between anal cancers. This was further corroborated, when the CC progression PLS model was applied to the anal dataset and a similar high correlation was observed (h). https://doi.org/10.1371/journal.pone.0260857.g004

PLOS ONE
combined with their previously identified 4 gene panel, C13ORF18, JAM3 and ANKRD18CP were the best discriminatory combination of methylated genes for detection of high-grade CIN [42,43]. Verlaat et al. applied the technique of next-generation sequencing of methylbinding enriched DNA (MBD-Seq) and identified 3 methylated genes (GHSR, SST and ZIC1) that in combination with 3q chromosomal gain showed high rate of detection of high-grade CIN [17]. Subsequently, that same group applied the HM450 platform to identify differential methylation between normal and CIN3 tissues (and no cervical cancer) which yielded 12 candidate markers which were then narrowed down to a 3-gene classifier (ASCL1, LHX8 and ST6GALNAC5) for the detection of CIN3 in high-risk HPV+ self-samples [18].
In this study, we used stringent analytical methodology that reduced genome-wide HM450 epigenetic data into a common 17 gene epigenetic classifier for CC and AC. Similar to our approach, Farkas et al. were among the first to apply the same HM450 platform for a comparison of differential methylation between 6 cervical cancers, 18 CIN3 and 20 normal cases [13]. They identified 6 genes as the best candidate methylated biomarkers of cervical cancer progression (RGS7, LHX8, ST6GALNAC5, TBX20, KCNA3 and ZSCAN18). Both this report and that of Farkas [13] identified targets of differential methylation between cancer and normal tissue and examined if those DMR were able to identify higher risk high-grade cervical and anal neoplasia. This is an alternative yet complementary approach to work published by the University of Groningen group which focused only on differentiating CIN2/3/HSIL from normal cells. Our joint CC and AC classifier, and the cancer-specific classifiers reported herein, independently identified DMR in genes that have previously been identified as being epigenetically altered in CIN2/3 mostly by targeted panel approaches. The most prominent of these include overlap of ASCL1 and WDR17 [12,18] in our joint model and, ZNF582, ST6GALNAC5, and c13orf18 in the disease-specific models [13,15,18,42]. Of note, our joint classifier contains ASCL1, which is one of the three markers identified by Verlaat et al. using HM450 arrays to differentiate CIN3 in high-risk HPV+ self-samples [18] and was also reported as an epigenetic marker in a gene panel developed for detection of oral SCC, another HPV-associated cancer [44]. Hypermethylation of PAX1 has been reported as a candidate methylated biomarker for oral dysplasia/cancer detection and used to differentiate normal cervical mucosa from CC specimens [45]. Such shared methylation changes suggest that HPV induces a number of nonrandom changes in the host methylome that may in turn, contribute to carcinogenesis. As noted, such alterations have the potential to be leveraged as clinical biomarkers and/or therapeutic targets [46].
The identification of shared differentially methylated targets leading to mutually applicable progression signatures between cervical and anal squamous carcinogenesis is a significant strength of this study. Given that the methylation signatures were derived from completely different sources, were preserved differently (fresh frozen vs. FFPE), processed and analyzed by methylation arrays at different times, it is highly compelling that they ultimately yielded a remarkable number of shared methylation targets as well as interchangeable cancer progression signatures. This certainly speaks to the robustness of the PLS methylation models. Furthermore, the CC progression PLS model was validated using three independent datasets [12]. In addition, we were the first group to apply the HM450 platform for the analysis of invasive AC [11] and to our knowledge, this study is the first to report the comprehensive application of HM450 to a patient sample set to directly derive a progression signature for anal neoplastic progression. In summary, using a comprehensive approach, we built and developed a PLS model-based classifier in CC, cross-validated the classifier using external open-source data and then, based on the high correlation between methylation status in CC and AC, we integrated the CC and AC models to develop a joint PLS classifier.
The shared CC and AC PLS methylation model (differentiating normal tissue from cancer) was able to segregate CIN3 or AIN3 cases into normal-like and cancer-like groups. The application of this model has potential implications for risk stratification of high-grade lesions. This concept of using methylated genes to define cancer risk heterogeneity among CIN2/CIN3 lesions was also described by Verlaat et al. [18,47]. Interestingly, by arbitrary application of a similar panel of methylation markers of cervical neoplastic progression to anal cancer progression, van der Zee et al. have demonstrated the detectability of AIN3 in both HIV+ and HIVpatients but also the ability to define potential patterns of higher cancer risk in these highgrade lesions [12,19,48].
As part of our shared AC/CC classifier, we identified a number of genes that are known to play a role in malignancies or key cancer-related cellular functions such as cell cycle regulation and cell adhesion. SORCS3, ZNF154 and ZNF177 are hypermethylated in gastric cancers [49][50][51], while ZNF177 is also hypermethylated in hepatocellular carcinomas [52]. Frequent methylation of MIR129-2 has been observed in both lymphoid malignancies and during the progression of monoclonal gammopathy of unknown significance (MGUS) to multiple myeloma [53]. In T-cell malignancies, recurrent missense and nonsense mutations have been identified in WDR17 [54]. FMN2 regulates cell cycle progression from G1 to S phase by regulation at p21 [55] and modulates adhesion stability via actin bundle regulation [56]. With respect to other genes, ZIK1 is a transcriptional repressor [57], ASCL1 induces tight junction protein Cldn5 [58] and MARCH11 is expressed in the testes and enhances lysosomal degradation and delivery [59].
Our study has certain inherent limitations. First, the CC progression PLS model was derived from a small sample set; however, the model was validated in three larger CC datasets. Second, the AC progression PLS model was generated in a retrospective analysis; however, the source of the AC patient population and tissues was part of a prospective national trial with systematic data and tissue collection methodology. The AC and CC specimens differed by preservation method (FFPE vs. fresh frozen, respectively); therefore, a combined PLS model could not be generated. In addition, the number of analyzable non-cancer anal tissue samples was small but reflected the inherent rarity of co-existing normal, AIN3 and invasive cancer samples from a common patient population. Although the AIN3 specimens were derived from tissue adjacent to or in association with invasive cancer, we would emphasize that both the AIN3 and normal tissues were frequently derived from separate paraffin blocks and if they were from the same section, they were clearly spatially delineated with confirmation by 2 independent pathologists with subsequent meticulous microdissection. It is acknowledged that there could be a possible field effect of similar methylation patterns (or contamination) adjacent to invasive cancer. The fact that adjacent normal tissues were distinguishable from tumor across 355 significant CpG loci representing 84 significant genes supports the fact that this PLS model is not overly influenced by invasive tumor field effect or contamination. However, the similarities between AIN3 and AC may be higher than that of normal tissue and may have led to the co-segregation of AIN3 with invasive ACs. Notably, in the cross-application of AC to CC, the AC signature was able to independently dichotomize CIN3 cases into normal-like and cancer-like specimens and provides additional reassurance.
It is well known that both cervical and anal mucosa when infected by HPV, follow a predictable progression from intraepithelial neoplasia to invasive malignancy. Due to the current inability to specifically distinguish high grade lesions that will progress to malignancy, ablative procedures are routinely recommended on all such lesions for the prevention of AC and CC. The identification of segregating biomarkers to better select those high-grade lesions which should be treated or observed would have a high clinical impact. It has been shown that there are distinct and shared methylation changes that occur across the genome during the progression of AC and CC. The profile of epigenetic alterations between these two cancer types is highly similar, suggesting that HPV-driven oncogenesis may result in similar non-random methylomic events. Herein, we identified shared epigenetic alternations between AA and CC and developed an integrated joint PLS methylation model for both CC and AC. This has implications for the future development of shared biomarkers as well as epigenotype-phenotype associations. Larger-scale validation studies and evaluation in other HPV-associated malignancies are warranted.
Supporting information S1 Fig. a-b. β-value histograms for anal and cervical tissues, respectively. The β-value histograms for the anal dataset (a) all show a bimodal distribution, with some tumor samples (redline) demonstrating a third peak and some degradation observed. A similar trend was observed for the cervical dataset (b) but without degradation. This is likely attributable to the fact that the anal specimens were FFPE while the cervical samples were fresh frozen. (TIF) S2 Fig. a-b. t-SNE was used to cluster the samples and separated normal (blue circles) from the tumor (red triangles) samples. AIN3 (grey squares) samples tended to be interspersed among the tumor samples in the anal dataset (a), while CIN3 (grey squares) cases segregated more closely to normal cervical tissues (b). (TIF) S1