Identification of Protein Biomarkers for Cervical Cancer Using Human Cervicovaginal Fluid

Objectives Cervicovaginal fluid (CVF) can be considered as a potential source of biomarkers for diseases of the lower female reproductive tract. The fluid can easily be collected, thereby offering new opportunities such as the development of self tests. Our objective was to identify a CVF protein biomarker for cervical cancer or its precancerous state. Methods A differential proteomics study was set up using CVF samples from healthy and precancerous women. Label-free spectral counting was applied to quantify protein abundances. Results The proteome analysis revealed 16 candidate biomarkers of which alpha-actinin-4 (p = 0.001) and pyruvate kinase isozyme M1/M2 (p = 0.014) were most promising. Verification of alpha-actinin-4 by ELISA (n = 28) showed that this candidate biomarker discriminated between samples from healthy and both low-risk and high-risk HPV-infected women (p = 0.009). Additional analysis of longitudinal samples (n = 29) showed that alpha-actinin-4 levels correlated with virus persistence and clearing, with a discrimination of approximately 18 pg/ml. Conclusions Our results show that CVF is an excellent source of protein biomarkers for detection of lower female genital tract pathologies and that alpha-actinin-4 derived from CVF is a promising candidate biomarker for the precancerous state of cervical cancer. Further studies regarding sensitivity and specificity of this biomarker will demonstrate its utility for improving current screening programs and/or its use for a cervical cancer self-diagnosis test.


Introduction
The human papillomavirus (HPV) is responsible for virtually all cervical cancers [1]. Although more than 150 variants of this virus exist, only certain genotypes, such as HPV 16,18,33,45 and 58 are known as high-risk types (HR-HPV) [2]. Low-risk HPV types (LR-HPV), mainly HPV 6 and 11, seldom cause genital tumors; however, they do cause condylomata acuminata (anogenital warts) [3]. Persistent HPV oncoprotein expression (E6/E7) in HPV infected epithelial basal cells deregulates cell division [4]. Overexpression of these viral genes causes the deregulation of cell proliferation, metabolism, apoptosis, differentiation and genomic instability, all of which may lead to consecutive stages of cervical intra-epithelial neoplasia (CIN1, 2 and 3) or squamous intra-epithelial lesions (low-grade SIL and high-grade SIL). Approximately 85% of all sexually active persons will be exposed to HPV, and the majority of all HR-HPV infections are virion producing (which is limited in time) suggesting that in addition to the viral genotype, several cofactors are closely associated with the persistence of the viral oncoproteins and the transformation of the cervical mucosa to malignant tissue [5,6].
There are several reasons why the development of a diagnostic assay, based on cervicovaginal fluid (CVF), for early detection of cervical cancer would be beneficial. At first, current screening assays are not optimal. Because of the high prevalence of cervical cancer, females between the ages of 25 and 65 are frequently screened using Pap(anicolaou) smears, a test which is based on detection of morphological changes of cervical mucosal cells. Unfortunately, the low sensitivity of this test can result in cervical cancer diagnosed at a late stage [7]. Data on randomized trials show that HPV DNA screening is more sensitive than cytology, but specificity is lower, which inevitably will result in an overtreatment because about 80% of the HPV positive patients spontaneously clear the virus [3]. Additionally, colposcopic examination is labor-intensive, difficult to automate and vulnerable to inter-and intra-observer variation [8]. The introduction of a new biomarker that helps improving sensitivity and specificity of current screening programs is therefore more than welcome. The second reason lies in the applicability of CVF for diagnosis. Because the fluid can easily be collected by the individual with the aid of special devices [9], a fast and simple test on the basis of a dipstick assay, performed by the woman herself, could be developed. Such a self test could be introduced e.g. in resource-limited populations where unequal burden of cervical cancer often occurs. Indeed, it is estimated that only 5% of women in low-resource countries are screened appropriately for cervical cancer [10]. Lack of healthcare infrastructure and financial cost are the main reasons why cytology-based programs are not implemented. Since alternative methods are currently investigated in these regions, use of a CVF-based self-diagnosis test could be considered. In addition, self tests could also be used in follow-up studies of vaccination programs since the effect of vaccination in sexually active women is uncertain [11]. Moreover, current vaccines only address neoplasia-inducing genotypes HPV16 and 218 suggesting that under ideal circumstances, only a maximum of 70% of all cervical cancers can be prevented [7,12]. Thus at least for the coming decades, careful evaluation of vaccination programs is highly recommended and a simple self-diagnosis test could help a lot in meeting this question.
In the past, several proteomics studies were performed to identify biomarkers for the (early) detection of cervical neoplasia. In different studies, precancerous and healthy cervical tissue was compared and Lee et al. studied the in vitro contribution of oncoprotein E7 to the development of cervical neoplasia using a HPV-infected cell line [13][14][15][16][17][18][19][20]. Several markers that were linked to dysregulation of the normal cell cycle, such as p16, Ki67 and cyclin E were characterized. These findings resulted in improved understanding of HPV-induced carcinogenesis but so far they all showed low sensitivity and/or specificity and therefore could not be used in the clinic.
For biomarker discovery and validation, body fluids are better than tissue samples because the following advantages are inherent to most body fluids: (1) easy accessibility; (2) avoidance of sampling risks; and (3) multiple sampling potential [21]. Today, plasma is the most analyzed biological fluid in the search for biomarkers and it has already been used in cervical cancer research [22,23]. However, plasma comes into contact with nearly all organs of the body, making plasma biomarkers less specific because it is difficult to determine from which organ they originate [21]. CVF can be obtained by cervicovaginal lavages, vaginal swabs, tampons or with a device for self-sampling [9,24] and has the following benefits over plasma: (1) it possesses a lower fluid volume/organ-ratio compared to plasma, resulting in higher sensitivity; and (2) it comes into contact with fewer organ systems, which most likely increases the specificity of the markers [21,25]. In this study, we focus on the identification of candidate biomarkers for cervical neoplasia by comparing the protein composition of individual CVF samples from healthy and precancerous (LSIL and HSIL) women. Our findings indicate that alpha-actinin-4 (ACTN4) is a promising candidate biomarker for the precancerous state of cervical cancer and that this CVF protein is very well suited for the development of a self-diagnosis test.

Study design and sample collection
All patients agreed to participate by written consent and the study was approved by the ethical committee of the Antwerp university hospital (registration number: B30020108372). Studyspecific patient identification codes were assigned and transmitted in such a manner that patient confidentiality was preserved.
All samples in this prospective, blinded, cohort study were taken by the same gynaecologist (WAAT). In case of abnormal Pap smear results (due to HPV infection or as a false positive outcome), both healthy and precancerous patients are routinely subjected to a colposcopic examination, a procedure that includes rinsing the vagina with 5% acetic acid. After colposcopy this washing fluid (containing the cervicovaginal fluid) is normally discarded but was collected for proteomic analysis. Additionally, cervical cytology samples were collected during this colposcopic examination to determine the cytology and HPV status as previously described by using type-specific PCRs [26]. All BD-SurePath liquid based cytology samples were sent to the pathology laboratory (RIATOL, Department of Molecular Diagnostics, Sonic Healthcare Benelux, Antwerp, Belgium) for processing. Based on the combination of colposcopic results, cytology outcome and HPV genotyping, healthy and precancerous women could be identified on a reliable manner. We selected samples originating from 6 healthy (normal colposcopy/cytology and HPV negative; group A) women and 6 precancerous (abnormal colposcopy/cytology and HR-HPV positive; group B) individuals (Table 1). All samples were derived from postmenopausal (. three years after the last menses) women from similar age (59+/213 years). Patients who used medication were excluded from the study. Besides, patients who showed visible signs of gynaecological infections (e.g. gonorrhoea) or suffered a HIV-infection were excluded as well. Based on the cervicovaginal fluid samples, patients were screened for trichomonas vaginalis. During proteomic analyses, healthy and precancerous CVF samples were always analyzed in pairs to minimize technical variations of the LC-MS/MS platform.

Proteomic analysis
After collection, the cervicovaginal lavages (25-40 ml) were immediately transported on ice to the laboratory and stored at 280uC. After centrifugation, the supernatant was concentrated by lyophilization to a final volume of approximately 200 mL. Protein samples (1 mg/sample) were separated and fractionated in the first dimension on a reverse phase (RP) protein C4 HPLC column. Enzymatic digestion with trypsin was performed and the resulting peptides of each fraction were separated in a second dimension on a RP-C18 micro-capillary HPLC system. Mass spectrometric analysis was performed using a MALDI-ToF/ToF instrument. Resulting MS/MS Spectra from each sample were screened against the human Swiss-Prot database (version: 57.1) using the MASCOT search engine. Analysis of the obtained datasets were performed as previously described [25]. More detailed information can be found in the Document S1 and Figure S1.
Additionally, several longitudinal samples (n = 29) were also tested for these two candidate biomarkers.

Statistics
To determine whether the difference in protein abundances was significant, normalized spectra abundance factors (NSAF values) were subjected to statistical testing. Because the selected proteins were identified in several samples, and therefore also contained several NSAF-values, a non-parametric Mann-Whitney U-test was used. Moreover, a chi-square test was performed, whereby the frequency of all identified proteins was analyzed to determine whether certain proteins were more or exclusively detected more under certain conditions. Unpaired Student's T-tests were performed to analyze the ELISA results. All statistical tests were performed using the Statistical Package of Social Science (SPSS version 18).

Differential study for the characterization of cervical cancer biomarkers
To identify differentially abundant proteins, 12 samples were divided in two groups and were individually analyzed using an analytical proteomics platform (Table 1). Group A comprised of 6 samples from healthy individuals, whereas group B consisted of 6 samples from precancerous individuals. Proteomic analysis resulted in 846 and 825 identifications for group A and group B, respectively, representing 371 (group A) and 341 (group B) nonredundant proteins ( Figure 1). The mean overlap between all samples from group A was 54% and from group B was 53% (Table S1).
A semi-quantitative analysis was also performed based on the label-free NSAF method (Table S2). For both groups, the NSAF values of the proteins present in at least 5 of the 6 samples were mutually compared. The variation in protein abundance between samples was expressed as the coefficient of variation (CV). A mean CV value of 60% (lowest: 19%; highest: 134%) was obtained for group A (healthy), while a mean CV of 52% (lowest: 14%; highest: 99%) was noted for group B.

Identification of differentially abundant proteins
To identify candidate biomarkers that correlate with the precancerous state of cervical cancer, healthy and precancerous CVF proteomes were compared. A qualitative analysis was performed between the 371 identified proteins for the healthy group and the 341 identifications for the precancerous group, which resulted in 238 or 67% overlapping proteins. To identify proteins with a statistically significant difference of occurrence, all identifications were subjected to statistical testing using a Pearson chi-square test. Proteins with a p value #0.05 are listed ( Table 2). From these calculations, ACTN4 was identified in all samples from precancerous patients but not in the samples from healthy patients. Moreover, ACTN4 was primarily identified based on several unique peptides with a mean MASCOT protein score of 162 (Table S3). The difference in the abundance of this protein was further verified by ELISA (see below).
In addition, a semi-quantitative comparison was performed to determine whether certain proteins that were identified in both the healthy and the precancerous groups were differentially expressed. The NSAF-values of corresponding proteins were statistically tested using a Mann-Whitney U-test. Only proteins with a  statistical significance of p#0.05 are listed (Table 3). This calculation resulted in four proteins that were characterized by a significant difference in abundance between both conditions. Only one protein, haptoglobin, showed a downregulation in the precancerous condition, whereas three other proteins, ATP synthase subunit beta, annexin A2 and PKM2, were upregulated. PKM2 and annexin A2 showed the lowest p value (0.01 and 0.03, respectively) and were detected in 5 of 6 samples from the precancerous group. Because highly variable proteins are not suitable candidate biomarkers, the variation in the abundances of these proteins was calculated. With CV values of 40% and 29% for the healthy and precancerous group for PKM2, respectively, this protein is more suitable as a candidate biomarker compared to annexin A2, which had values of 77% and 48%, respectively.

Verification of the candidate biomarkers
An ELISA was performed to verify the differential profiles of ACTN4 and PKM2. Healthy (n = 16) and HPV-infected samples (both HR-HPV oncogenic (n = 8) and LR-HPV non-oncogenic samples (n = 4)), which were not included in the previous experiment, were tested using commercially available kits ( Table 4).
Levels of ACTN4 were significantly higher in samples from HR-HPV infected individuals compared to samples from healthy, non-HR-HPV infected patients (p = 0.023), which confirmed our LC-MS/MS results. Remarkably, sample one, originating from a non-HR-HPV infected patient, appears to be an outlier because of the high ACTN4 concentration of 85.6 pg/mL (see discussion). Additionally, the LR-HPV infected patients had higher ACTN4 concentrations compared to the non-HPV-infected patients (p = 0.052), but somewhat lower values compared to the HR-HPV infected patients (p = 0.114). Comparison of the ACTN4 levels in HPV-infected (both HR-and LR-infections) versus noninfected samples (healthy patients) showed a significant difference, with a p value of 0.009 (Figure 2). The highest ACTN4 concentration for the healthy group (except for the outlier) was 17.3 pg/ml, whereas the lowest concentration for the HR-infected group was 30.6 pg/ml. The lowest concentration of ACTN4 in the LR-infected group was 21.3 pg/ml, which is higher than the highest value of the healthy group. Based on these observations, a threshold of 18 pg/ml can be set as the highest value of ACTN4 for the non-infected condition. No linear correlation was found between the viral load and the absolute concentration of ACTN4, which presumes that ACTN4 is not just a marker for HPV infection but rather for precancerous stage.
In addition, the most promising semi-quantitative candidate, PKM2, was verified with a commercial ELISA kit. For this analysis, the same 28 samples were used; however, the LC-MS/ MS results were not confirmed (data not shown).
We then analyzed the ACTN4 concentrations for an additional 29 longitudinal samples (i.e., samples from the same patient at several time points) from 9 patients. For each patient, a minimum Table 2. Proteins with a statistical significant difference in frequency between the healthy and precancerous group.  Table 3. Proteins with a statistical significant difference in abundance between the healthy and precancerous group. of three longitudinal samples was evaluated ( Figure 3). From these nine patients, five patients cleared the virus, two patients had persistent HPV infection, one patient acquired a new HPV infection and one patient was healthy. Although the numbers of patients per group are small, these samples can help us to confirm our findings. An ACTN4 ELISA was performed on several of the samples to identify correlations between the virus profiles and ACTN4 levels within one patient. However, in addition to the virus titer, we also expected the genotype and time of infection to influence the ACTN4 expression because different virus types have different carcinogenic capacities and cervical cancer usually develops after a persistent HR-HPV infection [2]. Recently, it was shown that the development of cervical precancer (cervical intraepithelial neoplasia of grade three; CIN3) is preceded by a steady increase in the viral load of a given HR-HPV type, whereas a rapid exponentially increasing load is generally cleared within 6 to 18 months and is usually associated with low-grade cytological abnormalities [27]. With these factors in consideration, the following two trends were observed from the experiments: (1) patients who had an early-stage infection (patient 1) or cleared the virus (patients 2, 6, 7 and 9) had increasing or descending ACTN4 levels, respectively, and patients with no or low infections (patient 5) or continuous infections (patients 4 and 10) had stable low or high ACTN4 levels, respectively; and (2) for a persistent infection (patients 1, 4 and 10), the ACTN4 accumulated.

Characterizing the CVF proteome of healthy and precancerous patients
In this study the CVF proteome from 12 (6 healthy and 6 precancerous) individual samples was characterized to identify candidate protein biomarkers for the detection of the precancerous state of cervical cancer. The analysis of these 12 samples was performed at both the qualitative (present/absent) and semiquantitative level. The number of protein identifications for each of the samples is very consistent with previously performed proteomic studies on CVF [28].
The determination of the amount of overlapping proteins between the samples within one group showed significant differences. Only 10% of all proteins were observed in each individual sample from both groups. This group consists primarily of proteins that are characteristic of the CVF proteome, based on their localization and biological function. We therefore call these proteins ''characteristic proteins''. However, a significant number of identifications, such as intracellular proteins, are less characteristic of the CVF proteome. This finding can be explained by processes such as the disruption of the epithelial cell layer during sampling and the shedding of dead epithelial cells. We therefore group these identifications as ''non-characteristic'' proteins. Based on these findings, which are common to almost every comprehensive CVF proteomics study, one can make a distinction between a ''core proteome'' of frequently identified characteristic CVF proteins and a ''variable proteome'' that contains the noncharacteristic CVF proteins [28]. Compared to the mean CV values of the common identifications between three technical replicates (25%) [25], the mean CV values of protein abundance in the healthy and the precancerous samples (60% and 52%, respectively) were significantly higher. This result was expected because, in contrast to the technical replicates, the samples were not identical. In all samples, proteins with high inter-individual differences were observed (up to 134%), whereas the measured mean CV value of the ''characteristic proteins'' was 38%. These results are similar to other proteomic studies of body fluids. For example, Yamakawa et al. [29] found a mean CV value of 63% for the seminal plasma proteome, and according to Anderson et al. [30], the mean CV value of a series of plasma proteins was approximately 45%.

Identification of differentially abundant proteins
To examine differential abundant proteins, a label-free spectral counting method was performed. As this method allows only for a semi-quantitative analysis, outstanding effects were first investigated as these usually give the most reliable results. Therefore, we first pursued proteins for which their presence or absence in the precancerous group was characteristic for a given condition. Statistical testing (chi-square test) resulted in 12 proteins with a significant difference of incidence (p,0.05), of which 6 proteins were exclusively identified in precancerous samples ( Table 2). The most significant protein was ACTN4 (Acc No: O43707), which was identified in all precancerous samples but none of the healthy samples (p = 0.001). Notably, SCCA-1, often used as plasma biomarker for cervical cancer [31], was also identified, albeit with a higher p value (p = 0.046).
ACTN4 is a critical component of the cytoskeleton and is involved in the formation of cell-cell adhesions [32]. It has been shown that the inhibition of certain membrane bound adhesion molecules caused by a malignant transformation results in a reduction of cell-cell adhesion strength and upregulation of ACTN4 [33]. Based on a knockdown study of ACTN4 using breast cancer cells, Khurana et al. demonstrated that ACTN4 was involved in the control mechanism of cell growth [34]. Moreover, the cytoplasmic accumulation of ACTN4 was associated with several malignancies and was linked with a bad prognosis [35], and an accumulative genomic gain of ACTN4 was associated with the formation of ovarian adenocarcinomas [36]. Also, RNAimediated downregulation of ACTN4 showed an association between the over expression of ACTN4 and an aggressive phenotype of oral squamous cell carcinoma [37]. Recently, the protein was found to be associated with several factors with varying functions, suggesting other roles for this protein, apart from actin regulation [38]. Indeed, Khurana et al. identified a LXXLL motif that was responsible for interaction of ACTN4 with the estrogen receptor and co-activators [39]. Recent studies in HPV transgenic mouse models provide evidence that estrogen and its nuclear receptor promote cervical cancer in combination with HPV oncogenes [40]. It is therefore tempting to speculate that ACTN4 plays a role in this important effect.
Based on the semi-quantitative NSAF-method, proteins were selected for their differential expression between the two conditions. Statistical testing (Mann-Whitney U) resulted in four proteins with a significant (p,0.05) difference in abundance (Table 3). Although these four proteins show a statistical significant difference during this discovery phase, a verification step is needed to confirm these results. When both the p-value and the frequency of presence were taken into account, annexin A2 and PKM2 were most promising (p,0.03) because both were identified in at least 9 of 12 individual samples, with a three-fold upregulation for the precancerous condition (Table 3). Despite the identification of annexin A2 in nearly every sample (11/12 samples) with an acceptable difference in abundance (p = 0.03), the inter-individual variations (CV 77%) for the healthy group were much higher compared to PKM2 (35%). Therefore, PKM2 (Acc No: P14618) was selected for verification by ELISA.
PKM2 is a glycolytic enzyme that contributes to the energy supply of the cell. This protein has two isoforms, an embryological isoform (M2) with elevated enzymatic activity and an adult isoform (M1). Recently, it was demonstrated that malignant transformations of cells by HR-HPV oncogenes E6 and E7 induced a PKM switch from M1 to M2 [41]. This switch causes a shift from normal cellular metabolism to elevated glycolysis, which is beneficial for the growth and neoplasm of tumor cells [42]. Unfortunately, we were not able to confirm the difference in PKM2 concentrations between CVFs from healthy and precancerous individuals, although two ELISA kits from different suppliers were used. It is possible that the semi-quantitative spectral counting method may be prone to too much variation. Alternatively, we have previously observed that the CVF matrix is not optimal for some ELISA configurations and this may be the case for the two PKM2 ELISA assays.

Verification of alpha-actinin-4 levels
Alpha-actinin-4, the most promising candidate biomarker, was verified on 57 additional CVF samples (28 individual samples and 29 longitudinal samples) by ELISA, a number that certainly fulfills the requirements for the discovery/verification phase of protein biomarkers [43]. The ELISA results showed that the concentration of ACTN4 is elevated in the HPV infected patients compared to the non-infected patients, thereby confirming the LC-MS/MS results. However, infection with LR-HPV (HPV-6 and HPV-11) resulted in higher ACTN4 concentrations in the CVF. Although the concentrations of ACTN4 measured in the CVFs from LR-HPV-infected individuals are higher than those from non-infected individuals, no significant difference in the abundance level was observed (p = 0.052). Because these findings are based on the analysis of samples from only four single LR-HPV-infected patients, more of this type of sample must be evaluated before conclusions can be drawn. One non-infected patient showed an outlier value of 85.6 pg/ml of ACTN4. This patient did not have a known history of HR-HPV infection, but the clinical file showed ischemic heart disease and a prevalence of breast cancer. It is therefore possible that this elevated concentration of ACTN4 is caused by the medical treatment or an LR-HPV infection other than HPV-6 or HPV-11. Therefore, extensive validation of this marker on a larger number of samples, including those from patients suffering from other diseases and infections (HIV, herpes, bacterial vaginosis, Chlamydia etc.), is the next step to determine whether ACTN4 is suitable for diagnosis purposes. In these studies, one could set a threshold at 18 pg/ml to discriminate between samples from healthy versus HPV-infected individuals.
Although ACTN4 levels did not numerically correlate with the viral loads, an ascending and descending trend could clearly be observed in patients who either started an infection or cleared the virus. We also observed that infections over longer periods caused accumulation of the marker. A high level of ACTN4 may therefore be an indication of a persistent oncogenic HPV infection and an increased chance of developing cervical cancer [6,44]. It was generally observed in longitudinal samples that total viral loads of approximately 100 particles/cell caused ACTN4 levels above the threshold value of 18 pg/ml. An exception on this rule was observed in patient 9, in which a viral load of 5,159 HPV35 particles/cell resulted in only 12.7 pg/ml ACTN4 (Figure 3). Whether this false negative result can be explained by the very low oncogenic potential of HPV35 must be investigated [2]. Conversely, two patients infected with very low doses of HPV6 had exceptionally high levels of ACTN4 (Table 4). These false positives could be the result of an exceptionally high ACTN4inducing capacity by HPV6 or of an additional viral, fungal, or bacterial infection that also caused an increase of ACTN4 levels in the CVF. One must be aware however, that release of ACTN4 is possibly a consequence of an immune-induced lysis of a precancerous lesions, since HPV virions are shed into the environment in the absence of lysis or necrosis [45]. This suggests that HPV virus production and ACTN4 release do not necessarily occur simultaneously, although they usually do. In the two above mentioned patients however, virus production may have stopped while precancerous lesions or benign tumors continue to be target for the crippled immune system, resulting in a concomitant release of ACTN4. Further validation steps, on a larger amount of samples, are required to determine the specificity and sensitivity of ACTN4 for the presence of HPV and/or precancerous lesions. Nevertheless, these results suggest that the HPV genotype, duration of infection and immune reaction may determine the abundance of ACTN4 in the CVF.

Application and future perspectives
Although earlier screening programs have undoubtedly reduced mortality and morbidity caused by cervical cancer, the characterization of one or more proteins that correlate with HPV infection and/or the pre-malignant state will further improve the efficiency of the current screening methods for cervical cancer [8]. As such, false positive and false negative diagnoses will be decreased such that gynaecologists can focus at risk patients. But in addition to diagnostic improvement, CVF protein biomarkers such as ACTN4, allow for the development of relatively simple antibody-based tests ('dipstick'). A significant portion of the female population does not participate in any screening, especially those who live in low resource countries, and many of these women would be willing to use a self-sampling test or use self-sampling devices for CVF collection [9,46,47]. Figure S1 Calibration curve for RP-C4 protein quantification.

Supporting Information
(JPG)