A multiplex biomarker assay improves the diagnostic performance of HE4 and CA125 in ovarian tumor patients

Objective Survival in epithelial ovarian cancer (EOC) remains poor. Most patients are diagnosed in late stages. Early diagnosis increases the chance of survival. We used the proximity extension assay from Olink Proteomics to search for new protein biomarkers with the potential to improve the diagnostic performance of CA125 and HE4 in patients with ovarian tumors. Material and methods Plasma samples were obtained from 180 women with ovarian tumors; 30 cases of benign tumor, 28 cases with borderline tumors, 25 early EOC cases (FIGO stage I) and 97 advanced EOC cases (FIGO stages II-IV). Proteins were measured using the Olink® Oncology II and Inflammation panels. For statistical analyses, patients were categorized into benign tumors versus cancer and benign tumors versus borderline + cancer, respectively. Results We analyzed 177 biomarkers. Thirty-four proteins had ROC AUC > 0.7 for discrimination between benign tumors and cancer. Fifteen proteins had ROC AUC > 0.7 for discrimination between benign tumors and borderline tumors + cancer. HE4 ranked highest for both comparisons. A reference model with HE4, CA125 and age (AUC 0.838 for benign tumors vs. cancer and AUC 0.770 for benign tumors vs. borderline tumors + cancer) was compared to the reference model with the addition of each of the remaining proteins with AUC > 0.7. ITGAV was the only individual biomarker found to improve diagnostic performance of the reference model, to AUC 0.874 for benign tumors vs. cancer and AUC 0.818 for benign tumors vs. borderline tumors + cancer (p < 0.05). Cross-validation and LASSO regression was combined to select multiple biomarker combinations. The best performing model for discrimination between benign tumors and borderline tumors + cancer was a 6-biomarker combination (HE4, CA125, ITGAV, CXCL1, CEACAM1, IL-10RB) and age (AUC 0.868, sensitivity 0.86 and specificity 0.82, p = 0.016 for comparison with the reference model). Conclusion HE4 was the best performing individual biomarker for discrimination between benign ovarian tumors and EOC including borderline tumors. The addition of other carcinogenesis-related biomarkers in a multiplex biomarker panel can improve the diagnostic performance of the established biomarkers HE4 and CA125.


Introduction
Around 700 Swedish women are diagnosed with ovarian cancer or borderline tumors every year. Symptoms are few and non-specific in the early stages, causing delays in diagnosis and treatment. While patients with borderline tumors have an excellent prognosis, with a five-year survival rate of 97%, the prognosis is poor in ovarian cancer patients. Half will die within five years of diagnosis [1,2].
Ovarian cancer is predominantly in the form of epithelial tumors (90%); the remaining 10% comprise germ cell and sex-cord stromal tumors. The main morphological subtypes in epithelial ovarian cancer (EOC) are high-grade serous (HGSC) (70%), endometrioid (EC) (10%), clear cell (CCC) (10%), mucinous (MC) (3%) and low grade serous cancer (LGSC) (<5%) [3]. These subtypes differ in origin and behavior, and respond very differently to oncological treatment [4,5]. Despite advances in surgical and oncological treatment, little improvement has been seen in long-term survival in EOC [6,7]. The majority of patients are diagnosed in late stages. In order to improve survival, the patients must be diagnosed earlier, when the disease is still curable. A screening method for ovarian cancer, for use in the general population, has been sought for decades. Two large-scale prospective population studies, the PLCO and UKCTOCS trials, were unable to show a significant decrease in ovarian cancer mortality from screening with the plasma protein biomarker CA125 and / or transvaginal ultrasound [8,9].
Apart from screening, a way to earlier ovarian cancer diagnosis is to improve the risk assessment when a patient presents with an adnexal mass. Patients with an estimated high risk of malignancy should be referred to the proper level of care without unnecessary delay. The multivariate Risk of Malignancy Index (RMI) algorithm (incorporating CA125, ultrasound score and menopause status) has been in clinical use since the 1990s [10]. The use of RMI requires ultrasound competence, which is not always available at the primary care level. In 2009, Moore et al. [11] introduced the Risk of Ovarian Malignancy Algorithm (ROMA) (CA125, HE4 and menopause status) dispensing with the need for ultrasound evaluation [11]. In their study comparing ROMA and RMI in 2010, Moore et al. [12] found better performance for ROMA compared to RMI, although these findings have been questioned by subsequent studies [12][13][14][15]. Both algorithms have reduced sensitivity and specificity in early stages of EOC when they would be of most diagnostic value [16]. Karlsen et al 2015 [17] introduced a modified version of the ROMA, the Copenhagen Index (CPH-I), substituting menopause status for age. The CPH-I, ROMA and RMI had comparable performance in a multicenter study [17]. However, ultrasound-based models have been found superior for the preoperative assessment of an adnexal mass, provided there is access to good quality ultrasonography [18]. Many research groups, including ours, have evaluated a range of other biomarkers and combinations of biomarkers for their potential use in ovarian cancer [19,20]. CA125 continues to stand out as the single-best biomarker [21,22] and considerable research has been focused on the search for additional biomarkers to improve the performance of CA125 alone [23]. Lately, researchers have turned to the rapidly evolving field of proteomics in the search for new candidate biomarkers, using new techniques for high throughput multiplex analysis in large-scale protein studies [24,25].
In this study we analyzed the Olink 1 Oncology II and Inflammation panels (in total 177 unique protein biomarkers) in 180 women with benign tumor, borderline tumor, early (stage I) or late (stage II-IV) EOC, with the aim of searching for new candidate biomarkers with the potential to improve the performance of HE4 and CA125 for discrimination between benign disease and EOC. We tested the individual biomarkers in three-biomarker combinations with HE4, CA125 and age. Only the addition of ITGAV improved the reference model of HE4, CA125 and age to a significant level. In order to test whether a multiplex biomarker model could further improve performance of the reference model, we combined cross-validation with LASSO regression. A 6-biomarker model (HE4, CA125, CXCL1, ITGAV, CEACAM1, IL-10RB and age) was found to be the best model for discrimination between benign tumors and EOC including borderline tumors.

Material and methods
A single cohort-design was used for biomarker discovery, with the aim of validation in a larger subsequent cohort of patients in case of positive findings.
Peripheral blood samples were obtained preoperatively from 180 women with an adnexal mass admitted for surgery at the Department of Obstetrics and Gynecology, Skåne University Hospital Lund, Sweden 2005 to 2012. Blood was collected in citrate tubes, centrifuged, and then the plasma was stored at −20˚C until it was analyzed. All diagnoses were verified by histopathologic examination. The histological type and stage of the disease according to the International Federation of Gynecology and Obstetrics (FIGO) were available in all malignant cases. The patient cohort included 30 cases of benign adnexal mass, 28 cases with borderline tumors, 25 early EOC cases (FIGO stage I) and 97 advanced EOC cases (FIGO stage II-IV) ( Table 1). The frozen plasma samples were shipped to Olink Proteomics AB, Uppsala, Sweden, for analyses.

Proximity extension assay
Proteins were measured using the Olink1 Oncology II and Inflammation panels (Olink Proteomics AB, Uppsala, Sweden) according to the manufacturer's instructions. The biomarkers included in each panel are listed in the supporting information, in the S1 and S2 Files. The Proximity Extension Assay (PEA) technology used for the Olink protocol has been well described [26] and enables 92 analytes to be analyzed simultaneously, using 1 μL of each sample. Pairs of oligonucleotide-labeled antibody probes bind to their targeted protein, and if the two probes are brought into close proximity the oligonucleotides will hybridize in a pair-wise manner. The addition of a DNA polymerase leads to a proximitydependent DNA polymerization event, generating a unique PCR target sequence. The resulting DNA sequence is subsequently detected and quantified using a microfluidic realtime PCR instrument (Biomark HD, Fluidigm). The final assay read-out is presented in Normalized Protein eXpression (NPX) values, which is an arbitrary unit on a log2-scale where a high value corresponds to a higher protein expression. The NPX values are relative and not comparable between different proteins. Analyses were performed by biomedical technicians at Olink Proteomics AB in Uppsala, Sweden. Disease status of the patients was unknown to the technicians performing the analyses. Samples were randomized across the plates and run in duplicates. Data was quality controlled and normalized using an internal extension control and an inter-plate control, to adjust for intra-and inter-run variation. All assay validation data (detection limits, intra-and inter-assay precision data, etc.) are available on the manufacturer's website (www.olink.com).

Statistical analyses
Hierarchical clustering analysis and principal component analysis were performed to search for clusters of proteins associated with the different tumor categories. Patients were subsequently categorized into benign tumors versus cancer, or benign tumors versus borderline tumors and cancer, and differences in protein expression between groups were analyzed with a Student's t-test with a p-value < 0.001 indicating a statistically significant difference; p-values were adjusted for multiple comparisons using the False Discovery Rate (FDR). Each biomarker was used as a continuous variable in univariate logistic regression models, with the binary outcome benign tumors versus cancer or benign tumors versus borderline tumors and cancer. Receiver Operator Curves (ROC) were constructed and the Area under the Curve (AUC) was calculated with 95% confidence intervals using the non-parametric bootstrap procedure. In order to evaluate the biomarkers' potential to improve the performance of the ROMA and CPH-I algorithms, a multivariate logistic regression model including the biomarkers HE4 and CA125 and age was constructed to serve as a reference model. Each of the biomarkers with AUC > 0.7 was added in turn to the reference model. The classification accuracy of each model was evaluated with the AUC, and the AUC for each model was compared to the AUC of the reference model using DeLong's method. A p-value < 0.05 for differences in AUC was considered statistically significant. For each model, the sensitivity corresponding to a specificity of 0.95, and specificity corresponding to sensitivity of 0.95 was calculated.
We wished to test whether a multiplex biomarker model could further improve diagnostic performance of the reference model. In order to select which combination of biomarkers to include in a final logistic regression model in addition to HE4, and CA-125 and age, a combination of cross-validation and LASSO-regression was employed. To start with the data was randomly split suing a 50/50 split into a training and test set. In the training set the shrinkage parameter (λ) was estimated using k-fold cross-validation. The estimated shrinkage parameter λ CV was then used in the test set in order to perform variable selection. The selected variables and the absolute value of the coefficients were saved. This process was then repeated 10 times. Next, the variables were ordered by the number of times they were selected and the sum of its estimated coefficients. The lowest ranked variable was removed and the entire process was repeated until a final model was selected. The final models were estimated with logistic regression. Receiver Operator Curves (ROC) were constructed and Area under Curve (AUC) calculated with 95% confidence intervals using the non-parametric bootstrap procedure. The AUC for each model was compared to the AUC of the reference model using DeLong's method. A p-value < 0.05 for differences in AUC was considered statistically significant.
All statistical analyses were carried out using R v 4.0.0 (R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/).

Ethics statement
Written informed consent was obtained from all study participants.

Results
Out of the 180 patient samples, eight samples did not pass internal quality control in the PEA analyses (www.olink.com) and were excluded from statistical analyses. These samples comprised two borderline tumors and six advanced stage EOC cases. The analyses below include 172 patients.
Non-hierarchical clustering analysis was performed for the whole patient cohort and for serous tumors alone. Heat maps indicating protein expression levels for each patient are shown in the S1 Fig

Benign tumors vs. cancer
Out of the 177 biomarkers analyzed, a statistically significant difference in NPX levels between benign tumors and cancer was found for eight proteins (using a conservative cut-off p < 0.001, p-values adjusted with False Discovery Rate (FDR)). HE4 (WFDC2) and CA125 (MUC16) were highest ranked. Most of the proteins were up-regulated in cancer patients although lower NPX levels were seen for two proteins, ITGAV and DNER, in cancer patients ( Table 2). Box plots for the six proteins with the lowest p-values are shown in S3 Fig. Table 3 shows the AUC values for discriminating benign tumors from cancer for the individual proteins. 34 proteins had AUC > 0.7. HE4 (WFDC2) ranked highest with AUC 0.830 (95% CI 0.739-0.921). ROC curves for the six proteins with the highest AUC values are depicted in Fig  1. Table 4 shows the AUC values, sensitivities (95% specificity) and specificities (95% sensitivity) for the reference model with HE4, CA125 and age (AUC 0.838 (0.752-0.924) and for the reference model with the addition of each one of the remaining 32 proteins with AUC > 0.7. ITGAV was the only biomarker to significantly improve the diagnostic performance of the reference model, to AUC 0.874 (0.799-0.949) (p = 0.045). Sensitivities and specificities are low with wide confidence intervals, calling for caution when interpreting results.
Cross-validation and LASSO regression were used to select multi-biomarker combinations for logistic regression models. A six-biomarker model including HE4, CA125, CEACAM1, CTSV, CXCL6, S100A4 and age was found to be the best model for discriminating between benign tumors and EOC with AUC 0.921 (0.863-0.979), sensitivity 0.897 / specificity 0.889 at best point cut-off (p = 0.025) ( Table 5 and Fig 2).  and for the reference model with the addition of each one of the remaining 13 proteins with AUC > 0.7. Again, only the addition of ITGAV would significantly increase the diagnostic performance of the reference model, to AUC 0.818 (0.737-0.900) (p<0.05). For both models the sensitivities and specificities were low and confidence intervals wide, indicating statistical uncertainty. Multi-marker models were developed using cross-validation and LASSO regression. A six-biomarker model (HE4, CA125, CXCL1, ITGAV, CEACAM1, IL-10RB and age) was the best model for discrimination between benign tumors and borderline tumors + cancer, with AUC 0.868 and sensitivity 0.86 / specificity 0.82 at best point cut-off (p = 0.016) ( Table 9 and Fig 4).

Our findings
In the current study only HE4 and MCA125 showed significant differences in expression levels in benign tumors vs. borderline + cancer, whereas six additional proteins differed significantly in expression levels between benign tumors and cancer. Also, the diagnostic performance of the reference model of HE4, CA125 and age was higher for benign tumors vs. cancer compared to benign tumors vs. borderline + cancer, illustrative of the diagnostic limitations of discriminating benign from borderline tumors using protein biomarkers. Biomarker levels may be normal or only slightly elevated in borderline tumors [11,27,28], and benign ovarian tumors as well as a range of benign conditions can present with elevated levels of biomarkers, including endometriosis, pelvic inflammatory disease, early pregnancy, and ascites of all causes [29,30]. While this is bound to lower the performance of biomarker-based algorithms, it can be argued whether this matters in a clinical setting, as borderline tumors have an excellent prognosis and will rarely progress to invasive cancer even after conservative surgery for a supposedly benign adnexal mass [31]. Integrin subunit Alpha V (ITGAV) was the only individual biomarker found to increase the performance of the reference model of HE4, CA125 and age above the significance threshold, for both comparisons. ITGAV is a subunit of the alpha V integrin receptor subfamily. Integrins are extracellular matrix proteins with a key role in angiogenesis. In epithelial ovarian cancer cells, ITGAV expression is essential for peritoneal dissemination [32]. Increased expression of ITGAV in tumor tissue has been associated with poor prognosis in ovarian cancer [33]. Interestingly, ITGAV expression was lower in plasma from patients with cancer compared with patients with benign tumors in our study, in line with the findings of Skubitz et al. who in their recent study on 92 biomarkers (Olink's Oncology II panel) reported lower levels of ITGAV in serum from ovarian cancer patients compared to healthy women [34].
We were interested to see whether a combination of multiple biomarkers could further improve the diagnostic performance of our reference model of HE4, CA125 and age. In order to lower the risk for upwards bias, we used a combination of cross-validation and LASSO regression to select biomarkers for multivariate logistic regression models. The resulting models for discrimination between benign tumors and cancer and benign tumors vs. borderline + cancer differed considerably in their selection of biomarkers, further highlighting the diagnostic challenges of borderline tumors in ovarian cancer diagnostics. The best model to discriminate benign ovarian tumors from EOC including borderline tumors was the six- CXCL1 is an inflammatory chemokine, promoting angiogenesis and recruitment of neutrophils [35,36]. Overexpression of CXCL1 induces EOC cell proliferation in vitro [37]. Wang et al 2011 reported CXCL1 to be overexpressed in serum from ovarian cancer patients and a biomarker model including CXCL1, CCL18 and CA125 was shown to discriminate ovarian cancer from benign ovarian tumors and healthy controls with a sensitivity of 92.6% for ovarian

PLOS ONE
cancer together with impressively high specificity of 99% for healthy controls and 94% for benign tumors [38]. In line with the findings of Wang et al, CXCL1 was expressed in higher levels in plasma from ovarian cancer patients in our study. CarcinoEmbryonic Antigen-related Cell Adhesion Molecule 1 (CEACAM1) is a member of the immunoglobulin superfamily of cell adhesion molecules (IgCAMs). CEACAM1 has important roles in angiogenesis, regulation of insulin action and immune responses and is crucial in the progression and metastasis of a range of cancers, exerting oncogenic as well as tumor suppressive actions [39,40]. Due to its inhibitory functions in immune cells including T and NK cells, in addition to being expressed on tumor cells, CEACAM1 makes a promising target for immunotherapy [41]. High expression of CEACAM1 correlates with better prognosis in advanced ovarian cancer patients, suggesting a tumor suppressor function in ovarian cancer [42]. In the current study plasma levels of CEACAM1 were lower in patients with cancer compared to benign tumors, supporting the role of CEACAM1 as a tumor suppressor. Interleukin 10 receptor subunit B (IL-10RB / IL-10R2) is a subunit of the heterodimeric interleukin 10 receptor complex, expressed on most immune cells [43]. IL-10 is an important immunoregulatory cytokine [44,45]. The IL-10RB is also a subunit of the receptors for several other members of the IL-10-interferon family, including IL-22 [43]. IL-10RB is overexpressed   in colorectal cancer and through binding of IL-22 contributes to colorectal carcinogenesis [46]. IL-10 levels are reported to be increased in serum and ascites from patients with ovarian cancer, with higher levels in advanced disease [47]. In the current study, the plasma levels of IL-10RB were higher in patients with cancer. To our knowledge, this is the first study reporting on circulating plasma levels of IL-10RB in patients with ovarian tumors. In summary, biomarkers associated with both oncogenic (ITGAV, CEACAM1, CXCL1, IL-10RB) and tumor-suppressive actions (CEACAM1) were found to increase the diagnostic performance of HE4, CA125 and age in our study. Out of these markers, only ITGAV had AUC > 0.7 as an individual marker for discrimination between benign tumors and borderline tumors + cancer, indicating that biomarkers with poor performance individually can add

Biomarkers for ovarian cancer detection
In recent years, proximity extension assay (PEA) technology has been employed in the identification of novel protein biomarkers and biomarker combinations for early detection of ovarian cancer, with promising results [34,[48][49][50]. The addition of inflammatory and immunological biomarkers to CA125 and HE4 holds potential to increase sensitivity and specificity compared to the established algorithms. However, in order for a screening method for a disease with incidence levels of ovarian cancer to be acceptable in a general population setting, a sensitivity of at least 75% and a specificity of 99.6%, corresponding to a PPV of 10% is recommended [51,52]. Adding to the difficulties in identifying a biomarker panel with high sensitivity and specificity for ovarian cancer is the heterogeneity of the disease, with different morphological subtypes expressing different patterns of biomarkers [53,54]. Also, as discussed above, borderline tumors and benign tumors can present with normal or slightly elevated biomarker levels [11,[27][28][29][30]. Boylan et al. 2017 [48], in their study on 81 women (healthy controls, benign disease, early and late stage serous ovarian cancer), were able to increase sensitivity for the detection of early stage serous cancer versus healthy women from 0.93 (CA125 alone) to 0.95 (specificity 0.95) with a 12-protein classifier derived from analysis of the Olink's Oncology Iv2 panel of 92 proteins (CA125, CD40.L, CD69, CXCL9, CXCL13, EGFR, EpCAM, DJ-1(PARK7), SELE, LAP. TGF.beta.1, TF, and VEGFR2) [48]. The same group recently published a study on 61 patients with late stage high-grade HGSC and 88 healthy controls analyzed with the Olink Oncology II panel. A multi-protein classifier of six biomarkers (CA125, FGFBP1, S100A4, EGF, ICOSLG, and MSLN) improved sensitivity from 0.85 (CA125 only) to 0.951 at a specificity of 0.996 to distinguish late stage HGSC from healthy women [34]. Enroth et al. in their recent large-scale study analyzing 593 plasma proteins with PEA, were able to identify a biomarker signature of 11 proteins (CA125, SPINT1, TACSTD2, CLEC6A, ICOSLG, MSMB, PROK1, CDH3, HE4, KRT19, and FR-alpha) plus age to discriminate ovarian cancer (all stages and histologies) from benign disease with a sensitivity of 0.85 at a specificity of 0.93 (AUC = 0.94, PPV = 0.92) [50].
The above referred PEA studies excluded borderline tumors, with the exception of Enroth et al. who did include borderline tumors in their final replication cohort and also included ovarian cancer samples of all histologies and stages [50]. The study by Boylan et al. 2017 [48] included serous cancer only, and Skubitz et al. 2019 [34] included late stage HGSC only [34,48]. The different study populations may explain the considerable variation in biomarkers included in the multi-protein models derived from the three studies. Only CA125 is included in all models. Excluding borderline tumors, early stage cancer and/or tumors of non-serous morphology is bound to strengthen the performance of a candidate biomarker panel. However, in the clinical setting these tumors will occur in the patient population with adnexal mass, which may also include (although rare) non-epithelial and metastatic tumors of the ovary. Given the vast heterogeneity of the tumor population it seems unlikely that a protein biomarker panel will reach a performance acceptable for screening on a population level.
In the common clinical challenge of risk assessment of an adnexal mass, HE4 and CA125 are validated for diagnostic use in the ROMA algorithm. A wide range of other protein biomarkers have been analyzed to date, with PEA and other diagnostic platforms. While a range of studies including ours indicate that some improvement in diagnostic accuracy can be gained from new biomarker combinations in comparison with CA125 and HE4, ultrasound-based models remain superior for assessing risk of malignancy in women with adnexal mass, albeit at the cost of lower specificity [14,18,55]. Biomarker tests can add improved specificity, are available at primary care level and complement diagnostic imaging in standard care today. In the triage of a patient with an adnexal mass, the patient with high risk of cancer according to a biomarker test can be referred to a tertiary center for further investigations including imaging with specialist ultrasound and/or CT/MR, before surgery. Still, histopathological examination of tissue is the gold standard for diagnosis.

Strengths/Limitations
The heterogeneity of our small study population and the variations in sample size call for the statistical analyses to be interpreted with caution. Cross-validation was employed to reduce the risk of upwards bias from fitting and testing our multi-biomarker models on the same patient population, however, due to the single-cohort design we were not able to validate our models in a larger cohort. Further studies will be needed to validate our findings.

Conclusion
HE4 was the best performing biomarker for discrimination of benign tumors versus EOC including borderline tumors in our study. ITGAV was the only individual biomarker found to improve the diagnostic performance of HE4, CA125 and age. Using LASSO regression, a multiplex model including 6 biomarkers (HE4, CA125, ITGAV, CXCL1, CEACAM1, IL-10RB) and age had the highest diagnostic accuracy for discrimination between benign ovarian tumors and EOC including borderline tumors. We find that the addition of other known carcinogenesis-related biomarkers in multiple marker combinations has potential to improve the performance of the established markers HE4 and CA125.