Accurate prognostication of locally advanced nasopharyngeal carcinoma (NPC) will benefit patients for tailored therapy. Here, we addressed this issue by developing a mathematical algorithm based on support vector machine (SVM) through integrating the expression levels of multi-biomarkers.
Ninety-seven locally advanced NPC patients in a randomized controlled trial (RCT), consisting of 48 cases serving as training set and 49 cases as testing set of SVM models, with 5-year follow-up were studied. We designed SVM models by selecting the variables from 38 tissue molecular biomarkers, which represent 6 tumorigenesis signaling pathways, and 3 EBV-related serological biomarkers. We designed 3 SVM models to refine prognosis of NPC with 5-year follow-up. The SVM1 displayed highly predictive sensitivity (sensitivity, specificity were 88.0% and 81.9%, respectively) by integrating the expression of 7 molecular biomarkers. The SVM2 model showed highly predictive specificity (sensitivity, specificity were 84.0% and 94.5%, respectively) by grouping the expression level of 12 molecular biomarkers and 3 EBV-related serological biomarkers. The SVM3 model, constructed by combination SVM1 with SVM2, displayed a high predictive capacity (sensitivity, specificity were 88.0% and 90.3%, respectively). We found that 3 SVM models had strong power in classification of prognosis. Moreover, Cox multivariate regression analysis confirmed these 3 SVM models were all the significant independent prognostic model for overall survival in testing set and overall patients.
Citation: Wan X-B, Zhao Y, Fan X-J, Cai H-M, Zhang Y, Chen M-Y, et al. (2012) Molecular Prognostic Prediction for Locally Advanced Nasopharyngeal Carcinoma by Support Vector Machine Integrated Approach. PLoS ONE 7(3): e31989. doi:10.1371/journal.pone.0031989
Editor: Qian Tao, The Chinese University of Hong Kong, Hong Kong
Received: September 15, 2011; Accepted: January 19, 2012; Published: March 9, 2012
Copyright: © 2012 Wan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by the National Natural Science Foundation of China (Grants No. 30772476 & 30873084 to Q.L., No. 81000934 to X.B.W. and No. 60902076 to H.M.C.) and the Fundamental Research Funds for the Central Universities (X.B.W.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Nasopharyngeal carcinoma (NPC), an Epstein-Barr virus (EBV) associated malignancy, has a remarkable racial and geographical distribution in Southeast Asia , . Compared with the early stage patients, cancer mortality associated with disease relapse still sustained a high level in advanced NPC . An accurate identification of patient prognosis will benefit this subset for developing distinct therapeutic and follow-up strategies in future.
Biomarker has been proven to be critical in predicting disease prognosis by complimenting TNM classification for risk definition . More importantly, biomarkers, with dual functions for both disease monitoring and novel molecular targeting, had shed the light on personalized therapy. For example, overexpression of EGFR, which occurred in 90% of head and neck squamous cell carcinoma (HNSCC) , predicted an inferior patient outcome . EGFR monoclonal antibody Cetuximab had demonstrated a survival benefit in combination with chemotherapy or radiotherapy for HNSCC , . In recent BATTLE (Biomarker-Integrated Approaches of Targeted Therapy for Lung Cancer Elimination) study , the first large clinical trial to use tumor biomarkers to guide therapy, 11 biomarkers associated with four NSCLC molecular pathways were analyzed for directing treatment choice. The results showed that each of the four treatments (erlotinib, vandetanib, erlotinib plus bexarotene, and sorafenib) targeted potently a specific molecular signature. Thus, identifying the pathogenesis pathway related biomarkers, that not only refining the patient prognosis but also providing guidance for pathway specific target therapy, will be of great benefit for advanced cancer patients.
Data mining, including decision tree, neural networks (artificial and fuzzy), and SVM, has been applied to predict cancer patient prognosis , , . Taken breast cancer and NSCLC for example, SVM had been confirmed to be a strong tool to refine the patient prognosis by integrating multi-gene profile , . In head and neck cancer, the specific molecular pathway related biomarkers signature had not yet been characterized using the learning algorithms method based prognosis prediction model.
In the present study, we studied the expression levels of 38 markers, which represented 6 pathological signaling pathways, and 3 EBV-related serological biomarkers associated with tumorigenesis of NPC. We addressed the prognostic effect of multi-biomarkers integrated SVM models with special focus on whether SVM model could subgroup patient prognosis in head and neck cancer.
Immunohistochemical (IHC) Staining, Univariate and ROC Curve Analysis
The baseline of patient clinicopathologic features of these two cohorts were displayed in Table 1. The median follow-up period was 63.8 months (range: 9.5 to 89.9 months) for overall patients. As our previous report, the IC/RT and IC/CRT subgroups displayed the similar OS (P = 0.783). The median overall survival was 73.9 and 70.1 months, respectively, in IC/RT and IC/CRT subgroups. The 2-year and 5-year OS was respectively 84.1% and 73.8% in IC/RT subgroup, compared with 81.8% and 72.3% in IC/CRT subset. The typical IHC staining of 38 biomarkers in these NPC samples was shown in Figure 1. As revealed in Table 2, each feature, that dichotomized by ROC curve generated cutoff point (Figure 2A), was subjected to univariate analysis. In training subgroup (48 patients), high tumor CENP-H (HR, 4.698; P = 0.023) and MMP 2 (HR, 3.489; P = 0.039) expression were associated with poor OS. In testing set, high Aurora-A (HR, 3.647; P = 0.021), Bcl-2 (marginal; HR, 4.423; P = 0.052) and VCA-IgA (HR, 3.787; P = 0.017) levels predicted an inferior OS. For all patients enrolled, high Aurora-A (HR, 2.872; P = 0.010), MMP 2 (HR, 2.942; P = 0.010) and VCA-IgA (HR, 2.688; P = 0.014) levels were correlated with worse OS. ROC curve analysis showed that SVM models had the largest area under the curve (AUC) compared with each individual AUC of 38 tissue molecules and 3 serological biomarkers (Figure 2), suggesting that SVM models was the most powerful prognostic value in refining patient outcome.
The panel displayed the representative expression of 37 molecular biomarkers in tumor zone for locally advanced NPC (original magnification, ×400).
In training set (A), at each immunohistochemical staining score of 38 tissue molecules and 3 serological biomarkers, the sensitivity and specificity for the outcome being studied were plotted, thus generating a ROC curve. The score, that closest to the point with both maximum sensitivity and specificity (0.0, 1.0), was selected as the cutoff point for further analysis.
SVM1 and OS
The SVM1 model showed highly sensitivity by integrating the expression levels of 7 tissue molecular biomarkers, including of Aurora-A, Beclin 1, Ki-67, N-cadherin, nm23-H1, P27 and TIMP 2. After educating the model in the training set, we identified 19 patients with high risk to death and 30 patients with low risk at testing set. The 5-year OS of the subgroup with high risk to death was 38.9% compared with 89.5% in low risk subset (Figure 3A, P<0.0001). Specifically, the predictive value of SVM1 in sensitivity, specificity, positive predictive value, negative predictive value, and overall accuracy were 78.6%, 77.1%, 57.9%, 90.0% and 77.6%, respectively. Cox multivariate regression analysis confirmed that SVM1 model was indeed the significant independent predictive model for patient risk to death (Table 3; HR, 42.275; 95% CI, 2.474 to 722.425; P = 0.010).
For SVM1 model (A), a significant survival disadvange was observed for the high risk to death subgroup, which was identified by SVM1 model, in testing set (left panel) and overall patients (right panel). For SVM2 (B) and SVM3 model (C), a statistically OS difference was shown between high and low risk to death subgroups, which was indentified respectively by SVM2 and SVM3 model, in testing set (left panel) and overall patients (right panel).
By summarizing the training and testing set as a group, we identified 30 patients with high risk and 67 patients with low risk to death. The 5-year OS of the patients with high risk to death was 24.1% compared with 95.3% in low risk subgroup (Figure 3A, P<0.0001). Specifically, the predictive value of SVM1 in sensitivity, specificity, positive predictive value, negative predictive value, and overall accuracy were 88.0%, 81.9%, 62.9%, 95.2% and 83.5%, respectively. Cox multivariate regression analysis confirmed that SVM1 model was the significant independent predictive model for patient risk to death (Table 3; HR, 320.826; 95% CI, 36.705 to 2804.256; P<0.0001). Moreover, a prognostic effect on age, Aurora-A, Ki67 and P27 were also observed in overall patients, though with relatively low HR (Table 3). The clinical features, including of gender, TNM stage as well as therapeutic regimens, and other molecular biomarkers however failed to prove any prognostic value.
SVM2 and OS
The SVM2 model showed high specificity by grouping the expression levels of 12 tissue molecular biomarkers (nm23-H1, Pontin, cyclin D1, N-Cadherin, 14-3-3σ, Ki-67, Aurora-A, Bcl-2, Beclin 1, MMP 2, EZH2 and TIMP 2) and 3 EBV-related serological biomarkers (EA-IgA, VCA-IgA and AER). After educating the model in the training set, we identified 14 patients with high risk to death and 35 patients with low risk at testing set individually. The 5-year OS of the subset with high risk to death was 28.6% compared with 87.8% in low risk subgroup (Figure 3B, P<0.0001). In detail, the predictive value of SVM2 in sensitivity, specificity, positive predictive value, negative predictive value, and overall accuracy were 71.4%, 88.6%, 71.4%, 88.6% and 83.7%, respectively. Cox multivariate regression analysis confirmed that SVM2 model was indeed the significant independent predictive model for patient risk to death (Table 3; HR, 6055.528; 95% CI, 2.718 to 1.349×107; P = 0.027).
By summarizing the training and testing set as a group, we identified 25 patients with high risk and 72 patients with low risk to death. The 5-year OS of the patients with high risk to death was 16.0% compared with 94.2% in low risk subgroup (Figure 3B, P<0.0001). Specifically, the predictive value of SVM2 in sensitivity, specificity, positive predictive value, negative predictive value, and overall accuracy were 84.0%, 94.5%, 84.0%, 94.4% and 91.8%, respectively. Cox multivariate regression analysis confirmed that SVM2 model was the significant independent predictive model for patient risk to death (Table 3; HR, 346.294; 95% CI, 24.742 to 4846.721; P<0.0001). In addition, Cyclin D1, EA-IgA and VCA-IgA were also the independent prognostic factors in overall patients, though with relatively low HR (Table 3).
SVM3 and OS
The SVM3 model, that incorporating SVM1 with SVM2, was subjected to refine patient risk to death when risk definition discrepancy was confronted at SVM1 and SVM2. In SVM3 model, we integrated the expression level of 13 tissue molecular biomarkers (nm23-H1, Pontin, cyclin D1, N-Cadherin, 14-3-3σ, Ki-67, Aurora-A, Bcl-2, Beclin 1, MMP 2, EZH2, TIMP 2 and P27) and 3 EBV-related serological biomarkers (EA-IgA, VCA-IgA and AER). As shown in Figure 3C, we identified 13 patients with high risk and 36 patients with low risk to death at testing set. The 5-year OS of the subset with high risk to death was 30.8% compared with 85.2% in low risk subgroup (Figure 3C, P<0.0001). Specifically, the predictive value of SVM3 in sensitivity, specificity, positive predictive value, negative predictive value, and overall accuracy were 64.3%, 88.6%, 69.2%, 86.1% and 81.6%, respectively. Cox multivariate regression analysis confirmed that SVM3 model was indeed the independent predictive model for patient risk to death (Table 3; HR, 1401.433; 95% CI, 0.883 to 2.223×106; P = 0.054).
By summarizing the training and testing set as a group, we identified 24 patients with high risk and 73 patients with low risk to death. The 5-year OS of the patients with high risk to death was 16.7% compared with 92.8% in low risk subgroup (Figure 3C, P<0.0001). Specifically, the predictive value of SVM3 in sensitivity, specificity, positive predictive value, negative predictive value, and overall accuracy were 88.0%, 90.3%, 75.9%, 95.6% and 89.7%, respectively. Cox multivariate regression analysis confirmed that SVM3 was an independent predictive model for patient risk to death (Table 3; HR, 540.456; 95% CI, 33.336 to 8761.995; P<0.0001). Additionally, age, P27 and VCA-IgA showed prognostic effect for overall patients, though with the lower HR (Table 3).
The important challenge complementing the anatomic TNM staging prognostication is to integrate the nonanatomic molecular biomarkers . Indeed, circulating serological and tissue molecular prognostic factors were currently used for predicting cancer patient outcome individually. Here, we examined the expression levels of 38 tissue molecular biomarkers representing 6 pathological signaling pathways and 3 EBV-related serological biomarkers for further characterizing their prognostic value by constructing the SVM models in a randomized controlled trial. By integrating 16 biomarkers that displayed higher predictive values, we designed 3 SVM prognosis models. Our finding demonstrated that those 3 SVM models showed the powerful efficacy in defining patient risk to death individually, indicating the promising clinical usage in future therapeutic and follow-up management.
Accurate characterization of patient outcome, that not only permits treatment to be individualized but also improves patient follow-up economic benefit cost ratio, is markedly important for locally advanced NPC. Biomarkers that aberrantly expressed in tissue or circulation have been proven to be critical in guiding treatment selection and predicting disease prognosis , . Both the single biomarker reflecting the cancer phenotype in a microscopical manner and the TNM stage system predicting patient outcome in a macroscopical manner showed however a limited predictive power for individual outcome. In the present study, we designed a SVM model by integrating the expression levels of several tissues molecular biomarkers and NPC specific serological biomarkers to refine patient risk to death individually. We thus raised three key clinical implications of this SVM based prognostic model for locally advanced NPC: i) the molecular biomarkers included in this study were detected by IHC and ELISA and thus might be readily adaptable to clinical practice; ii) patients with inconsistent definition of risk to death between SVM1 and SVM2 models would be subjected to SVM3 for further determination. iii) the therapeutic regimen for advanced NPC might be redirected for the particular subgroup according to the SVM risk definition. Specifically, the patients with low risk definition could receive routine therapeutic regimen to avoid the serious side effect of intensive treatment modality. However, for patients with high risk definition, the standard chemoradiotherapy might not be sufficient. The target agent that specific to the particular molecular biomarker , and more aggressive chemotherapy regimen may be employed to maximize the therapeutic benefit.
In comparison with other data mining methods , such as neural networks (artificial and fuzzy) , clustering , genetic algorithms  and decision trees , SVM performs classification by constructing an N-dimensional hyperplane that optimally separates the data into two categories . This feature thus presented great priority in predicting cancer patients prognosis that with two classifications (death VS alive). Additionally, Newman-Keuls test was used to deal with multiple comparisons that raised by multiple variables included in this study, ensuring the rational IHC score for further SVM analysis. More importantly, the higher generalization ability made SVM could train the model with limited cases by grouping several efficient features. Here, we designed the SVM models for advanced NPC by integrating TNM stage, tissue molecular features (nm23-H1, Pontin, cyclin D1, N-Cadherin, 14-3-3σ, Ki-67, Aurora-A, Bcl-2, Beclin 1, MMP 2, EZH2, TIMP 2, COX 2 and P27) along with EBV-related biomarkers (AER, EA-IgA and VCA-IgA), which reflected each patient tumorigenesis phenotype not only in macroscopic but also in microcosmic aspect. Thus, these multibiomarkers based models would provide more powerful efficacy in prediction of patient outcome. Indeed, our finding confirmed that the SVM models had strong ability in refining patient risk to death individually (Figure 3, high risk VS low risk: SVM1 24.1% VS 95.3%, SVM2 16.0% VS 94.2%, SVM3 16.7% VS 92.8%). However, we also observed the inconsistence in predicting patient outcome among SVMs in testing set regarding to age, Aurora-A, P27 and VCA-IgA. Taken Aurora-A for example, P value were 0.608, 0.683 and 0.098 for SVM1, SVM2 and SVM3, respectively. The underlying reasons might lie in the small cohort size in testing set since the significant prognostic value was observed when the cohort combined both training set and testing subgroup.
Taken together, our study demonstrated that multibiomarkers integrated SVM models led to more precise risk definition, offering a promising and individualized selection for future therapeutic regimen.
The 408 locally advanced NPC patients (Stage III and IVa) were enrolled in a randomized controlled trial (RCT) designed for therapeutic as well as SVM-biomarker study from August 2002 to April 2005 . In the therapeutic study, the therapeutic effect of induction chemotherapy+radiotherapy (IC/RT) was compared to induction chemotherapy+concurrent chemoradiotherapy (IC/CRT). In this biomarker study, randomized 103 patients (50 IC/CRT+53 IC/RT) were selected for multi-biomarkers-SVM prognosis analysis. Excluding 6 patients (4 IC/CRT+2 IC/RT) lost to 5-year follow-up, 97 patients (46 IC/CRT+51 IC/RT) were enrolled in this study. The baseline of patient clinicopathologic features of these two cohorts were displayed in Table 1. Of these 97 patients, randomly selected 48 patients (25 IC/RT+23 IC/CRT) were used as training set for SVM model education and the rest of 49 patients (26 IC/RT+23 IC/CRT) served as testing set. The cancer stage was defined according to the 1992 NPC staging system of China , . This study was approved by the Clinical Ethics Review Board at Cancer Center of Sun Yat-sen University, and written informed consent was obtained from all patients at their recruitment.
In this RCT, strict eligibility criteria protocol was employed as following: pathological confirmed as nonkeratinizing or undifferentiated carcinoma of nasopharynx (World Health Organization types of II or III); aged 18–65 years; performance status score: 0–2; clinical stage: III-IVa; leukocyte count (WBC) ≥4.0×109/L and platelet ≥100.0×109/L; total bilirubin (TBIL) and alanine aminotransferase (ALT) <2× the upper limit of normal value; creatinine (Cr) <1.5× the upper limit of normal value. Patients were excluded from this RCT with the following exclusion criteria: uncontrolled infection; previously received any anticancer therapy; pregnancy and lactation; prior malignancy; unsuitable for chemotherapy due to deficiency of liver, kidney, lung and heart.
The routine staging workup comprised of a detailed clinical examination of the head and neck, fiberoptic nasopharyngoscopy, magnetic resonance imaging (MRI) of the entire neck from the base of the skull, chest radiography, abdominal sonography, a complete blood count, and a biochemical profile. New Drug Statistical Treatment 8.0 software was employed to generate a random number table for further patient assignment.
In IC/RT subset, patients received two cycle of floxuridine+carboplatin (floxuridine 750 mg/m2, d1–5; carboplatin AUC = 6) chemotherapy and underwent radiotherapy thereafter at one week interval. In IC/CRT subgroup, one week after completion of two cycle floxuridine+carboplatin (floxuridine 750 mg/m2, d1–5; carboplatin AUC = 6), patients received radiotherapy and concurrent carboplatin (AUC = 6) chemotherapy on day 7, 28 and 49, respectively.
Prior to and after the carboplatin infusion, 1000–1500 ml normal saline, 20 g (250 ml) and normal saline 2000–3000 ml were respectively given to patients. Aheading of drug infusion, the 5-hydroxytryptamine-3 receptor antagonists and dexamethasone (20 mg) were used to guard against vomiting. For patients with serious myelosuppression, the chemotherapy schedule would be delayed to the serological leukocyte counts ≥3.0×109/L and platelet count ≥100.0×109/L. The carboplatin dose adjustment was based on the level of posttreatment creatinine clearance. When posttreatment serological creatinine clearance ≥60 mL/min, the original regimen could be maintained at the next cycle of chemotherapy. When the creatinine clearance decreased to 40–59 mL/min, a reduction of 50% carboplatin dose was required for the next cycle of chemotherapy. Once the serum creatinine clearance was less than 40 mL/min, carboplatin should be removed at the next cycle of chemotherapy.
The traditional Co60 γ-ray or linear accelerator 6–8 MV photon based two-dimensional technique was administered for radiotherapy. The radiation fields were determined by the extension of the tumor and local regional cervical lymph node invasiveness. The target radiation fields, including the tumor and a 2-cm marginal extension in all directions, obtained at least 90% of the mid-depth central axis dose. During the first course, two lateral opposing faciocervical portals were exposed to 36–40 Gy irradiation. At the second course, facio-cervical splitting portals course was employed. When the oropharynx was invaded, the facio-cervical portals would be used in these 2 courses, followed by 8–12 Mev electric beam irradiation at the posterocervical triangular regions. The anterior nasal region (6–8 Gy) or the parapharyngeal region (6–8 Gy) would be irradiated for subset with regional tumor invasion.
The accumulated radiation dose of 68–72 Gy, with 2 Gy daily fractions and 5 days per week, was given to the primary tumor. Additional boosted 8 to 12 Gy would be delivered to subgroup with residual tumor and destructed skull base. The neck region obtained 50 to 70 Gy radiation according to the extent of the lymph node tumorigenic invasiveness. For lymph node negative and positive invaded necks, 50 Gy and 60 to 70 Gy radiation would respectively be given.
Immunohistochemical (IHC) staining and EBV-related serological antibodies assay
Both tissue microassays and IHC were performed as previously described , . The candidate biomarkers consisted of reported prognostic markers with high predictive value and a number of key tumorigenesis signaling pathways related molecules . A total of 38 biomarkers (Figure 1) representing 6 pathological signaling pathways related to NPC disease progression, consisting of cell cycle: Cyclin D1 (Cell Signaling, #2978, 1∶200 dilution), 14-3-3σ (Santa Cruz, SC-100638, 1∶200 dilution), Aurora-A (Upstate, 1∶200 dilution), CENP-H (Santa Cruz, SC-22792, 1∶200 dilution), Stathmin (Cell Signaling, #3352, 1∶200 dilution), P21WAF1 (Santa Cruz, SC-817, 1∶200 dilution), CDC2 (Santa Cruz, SC-53, 1∶200 dilution), P27 (Millipore, clone Y236, 1∶200 dilution), ERK (Santa Cruz, SC-94, 1∶200 dilution), p-ERK (Santa Cruz, SC-7383, 1∶100 dilution), Ki-67 (Santa Cruz, SC-23900, 1∶50 dilution); migration & invasion: E-Cadherin (Cell Signaling, #4065, 1∶50 dilution), β-catenin (Millipore, MAB2081, 1∶100 dilution), N-Cadherin (Upstate, clone 13A9, 1∶200 dilution), Snail (Abcam, ab70983, 1∶50 dilution), Twist (Santa Cruz, SC-102032, 1∶50 dilution), c-Met (Santa Cruz, SC-161, 1∶200 dilution), nm23-H1 (Santa Cruz, SC-56928, 1∶200 dilution); tumor microenvironment: HIF-1α Millpore, MAB5382, 1∶200 dilution), COX2 (Santa Cruz, SC-58344, 1∶200 dilution), MMP-2 (Santa Cruz, SC-53630, 1∶200 dilution), MMP-9 (Santa Cruz, SC-6840, 1∶200 dilution), TIMP-2 (Santa Cruz, SC-21753, 1∶200 dilution), VEGF (MaiXin, 1∶200 dilution), CD31 (MaiXin, 1∶200 dilution) microvessel density (MVD), CD34 (MaiXin, 1∶200 dilution) MVD, CD31+/CD34− MVD, CD8 (MaiXin, 1∶200 dilution), CD45RO (MaiXin, 1∶200 dilution) and D2–40 (MaiXin, 1∶200 dilution); apoptosis & autophagy: Bax (Santa Cruz, SC-7480, 1∶200 dilution), Bcl-2 (Santa Cruz, SC-7382, 1∶50 dilution), Survivin (Santa Cruz, SC-47750, 1∶100 dilution), AKT 1 (Cell Signaling, #4685, 1∶100 dilution), Pontin (Cell Signaling, #8959, 1∶100 dilution), Beclin 1 (Santa Cruz, SC-11427, 1∶200 dilution); epigenetic related molecule EZH2 (Cell Signaling, #4905, 1∶200 dilution) and EBV related molecule LMP 1 (Santa Cruz, SC-57721, 1∶200 dilution), were tested in this study. A negative control was utilized by changing the specific primary antibody with non-immune serum immunoglobulins at the 1∶200 dilutions. Serological EBV related antibodies, EA-IgA, VCA-IgA and anti-enzyme rate (AER) of EBV DNase-specific neutralizing antibody, were tested prior to oncologic treatment by ELISA method , . Antibody testing of each used antibody was done prior to the IHC staining.
Semi-quantitative assessment of IHC
The expression profile of each biomarker was evaluated by combined assessment of staining intensity and extent as we previously described , . Immunohistochemical (IHC) staining was evaluated and scored by two independent pathologists (X.-J.F. & J.X.) blindly to clinical follow-up data. The third pathologist will arbitrate the discrepancy arose between these two pathologists. Totally, the ratio of complete agreement of the overall score reached to 87%. The MVD were evaluated by counting CD31+ capillaries, CD34+ capillaries, CD31+/CD34− capillaries and D2–40 (lymphangial specific marker) in the three most vascularized areas (“hotspots”) , . The immune microenvironment reactivity was assessed by counting positive stained CD8 and CD45RO T cells in the three “hotspots” .
Selection of cutoff score for each biomarker “positive” expression
The receiver operating characteristic (ROC) curve analysis was subjected to the selection of cutoff score in the training set as we previously reported . Briefly, the sensitivity and specificity for patient outcome being studied at each score were plotted to generate a ROC curve. The score localized closest to the point at both maximum sensitivity and specificity, the point (0.0, 1.0) on the curve, was selected as the cutoff score that might be correctly classified patient outcome as death or alive.
Clinical outcome assessment
The patients in this RCT were all followed up with strict protocol. After the completion of therapy, patients were observed at 3-month intervals during the first 3 years and at 6-month intervals thereafter. The latest date of each patient being followed up was May 15 2010, ensuring the accurate 5-year survival condition of each patient was obtained and readily for further SVM analysis. The 5-year survival condition was defined as death or alive at the appointed date of 5 years post-diagnosis. Overall survival (OS) was defined as the time from diagnosis to the date of death or censored at the latest date.
Sample size estimation
Given the robust capacity of SVM prognostication model in optimally separating the data into two categories, a relative small sample size might be enough to achieve the goal of powerful prognosis prediction. In this study, the STATA COX regression was used to estimate the sample size based on 38 biomarkers expression level. In light of the 24.2% OS events probability in therapeutic regimen RCT, a total of 95 cases were required to achieve 90% power for a 5% significance level assuming the OS HR increased >ten-fold for SVMs model. When considering the subgroup that might loss to follow-up, the cases size was further enlarged to 103.
Support vector machines (SVM) model construction
In this study, we considered the patient prognosis as a two-class pattern classification (death VS alive). We employed a vector to denote the pattern of n components for a patient. In our binary classification, patient who survived for more than 5 years was denoted by whereas represented the patient survived less than 5 years. The overall patients were randomly divided into two subgroups: training set that was employed to construct decision function D (•), and testing set was used to test the predictive accuracy of decision function.
The main procedure for SVM classification involved two steps. Firstly, the input feature vector X is mapped into a higher dimensional space H through an underling nonlinear mapping φ. Secondly, the linear classification is applied in this mapping space. A SVM decision function can be rewritten as , where parameter denotes the support vector. The unknown parameter ω could be obtained through minimization of the following structural risk function: (1), . The value of is a user-specified positive parameter, and ξi are slack variables. Given the two classes are separable, minimizing the structural risk in (1) contributes to the maximal separating margin between these two classes.
In this study, the performance of classification was calculated using the following loss function: , where and denoted the overall accuracy and sensitivity for set S, respectively. The definition of and was (2), .
To maximize the area under ROC curve, we also defined another loss function based on ROC parameter as following: (3), where indicated the area under ROC curve for the testing set S.
The classical RBF kernel function was used in SVM model construction. To find optimal parameters of SVM model, including kernel size σ and regularization parameter C, standard Leave-one-out cross-validation was employed to search over a grid .
The multivariate Cox proportional hazards model was utilized to estimate the hazard ratio (HR) and 95% confidence interval (CI). The survival probabilities difference between patients subsets in OS were determined by Kaplan-Meier analysis and log-rank tests. A two-tailed P<0.05 was considered statistically significant. Statistical analysis was performed using SPSS v. 17.0 (SPSS, Inc., Chicago, IL).
Conceived and designed the experiments: XBW XJF MHH QL Y. Zhao. Performed the experiments: XBW XJF MYC Y. Zhang JX XYW. Analyzed the data: HMC XBW XJF QL. Contributed reagents/materials/analysis tools: YXZ MHH QL HBL. Wrote the paper: XBW QL HMC.
- 1. Wei WI, Sham JS (2005) Nasopharyngeal carcinoma. Lancet 365: 2041–2054.
- 2. Chen HY, Yu SL, Chen CH, Chang GC, Chen CY, et al. (2007) A five-gene signature and clinical outcome in non-small-cell lung cancer. N Engl J Med 356: 11–20.
- 3. Chua DT, Ma J, Sham JS, Mai HQ, Choy DT, et al. (2005) Long-term survival after cisplatin-based induction chemotherapy and radiotherapy for nasopharyngeal carcinoma: a pooled data analysis of two phase III trials. J Clin Oncol 23: 1118–1124.
- 4. McShane LM, Altman DG, Sauerbrei W, Taube SE, Gion M, et al. (2005) Reporting recommendations for tumor marker prognostic studies (REMARK). J Natl Cancer Inst 97: 1180–1184.
- 5. Kalyankrishna S, Grandis JR (2006) Epidermal growth factor receptor biology in head and neck cancer. J Clin Oncol 24: 2666–2672.
- 6. Rubin Grandis J, Melhem MF, Gooding WE, Day R, Holst VA, et al. (1998) Levels of TGF-alpha and EGFR protein in head and neck squamous cell carcinoma and patient survival. J Natl Cancer Inst 90: 824–832.
- 7. Bonner JA, Harari PM, Giralt J, Cohen RB, Jones CU, et al. (2010) Radiotherapy plus cetuximab for locoregionally advanced head and neck cancer: 5-year survival data from a phase 3 randomised trial, and relation between cetuximab-induced rash and survival. Lancet Oncol 11: 21–28.
- 8. Kim ES, Herbst RS, Lee JJ, Blumenschein G, Tsao A, et al. (2009) Phase II randomized study of biomarker-directed treatment for non-small cell lung cancer (NSCLC): The BATTLE (Biomarker-Integrated Approaches of Targeted Therapy for Lung Cancer Elimination) clinical trial program. Journal of Clinical Oncology, 2009 ASCO Annual Meeting Proceedings (Post-Meeting Edition) 27: 8024.
- 9. Vermeulen J, De Preter K, Naranjo A, Vercruysse L, Van Roy N, et al. (2009) Predicting outcomes for children with neuroblastoma using a multigene-expression signature: a retrospective SIOPEN/COG/GPOH study. Lancet Oncol 10: 663–671.
- 10. Potti A, Mukherjee S, Petersen R, Dressman HK, Bild A, et al. (2006) A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer. N Engl J Med 355: 570–580.
- 11. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530–536.
- 12. Cho WC (2007) Nasopharyngeal carcinoma: molecular biomarker discovery and progress. Mol Cancer 6: 1.
- 13. Wan XB, Long ZJ, Yan M, Xu J, Xia LP, et al. (2008) Inhibition of Aurora-A suppresses epithelial-mesenchymal transition and invasion by downregulating MAPK in nasopharyngeal carcinoma cells. Carcinogenesis 29: 1930–1937.
- 14. Anderson B, Hardin JM, Alexander DD, Grizzle WE, Meleth S, et al. (2010) Comparison of the predictive qualities of three prognostic models of colorectal cancer. Front Biosci (Elite Ed) 2: 849–856.
- 15. Lancashire LJ, Lemetre C, Ball GR (2009) An introduction to artificial neural networks in bioinformatics–application to complex microarray and mass spectrometry datasets in cancer studies. Brief Bioinform 10: 315–329.
- 16. Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4: 44–57.
- 17. Shah S, Kusiak A (2007) Cancer gene search with data-mining and genetic algorithms. Comput Biol Med 37: 251–261.
- 18. Neville P, Tan PY, Mann G, Wolfinger R (2003) Generalizable mass spectrometry mining used to identify disease state biomarkers from blood serum. Proteomics 3: 1710–1715.
- 19. Vapnik V, Chapelle O (2000) Bounds on error expectation for support vector machines. Neural Comput 12: 2013–2036.
- 20. Huang PY, Mai HQ, Luo DH, Qiu F, Li NW, et al. (2009) Induction-concurrent chemoradiotherapy versus induction chemotherapy and radiotherapy for locoregionally advanced nasopharyngeal carcinoma. Ai Zheng 28: 1033–1042.
- 21. Minsky BD, Cohen AM, Kemeny N, Enker WE, Kelsen DP, et al. (1992) Combined modality therapy of rectal cancer: decreased acute toxicity with the preoperative approach. J Clin Oncol 10: 1218–1224.
- 22. Hong MH, Mai HQ, Min HQ, Ma J, Zhang EP, et al. (2000) A comparison of the Chinese 1992 and fifth-edition International Union Against Cancer staging systems for staging nasopharyngeal carcinoma. Cancer 89: 242–247.
- 23. Nocito A, Kononen J, Kallioniemi OP, Sauter G (2001) Tissue microarrays (TMAs) for high-throughput molecular pathology research. Int J Cancer 94: 1–5.
- 24. Meng HM, Zheng P, Wang XY, Liu C, Sui HM, et al. (2010) Overexpression of nanog predicts tumor progression and poor prognosis in colorectal cancer. Cancer Biol Ther 9:
- 25. Tang JW, Rohwader E, Chu IM, Tsang RK, Steinhagen K, et al. (2007) Evaluation of Epstein-Barr virus antigen-based immunoassays for serological diagnosis of nasopharyngeal carcinoma. J Clin Virol 40: 284–288.
- 26. Huang D (1990) Epstein-Barr virus specific kinase and their antibodies. Academic Journal of Sun Yat-sen University of Medical Sciences 3: 1–4.
- 27. Qian CN, Huang D, Wondergem B, Teh BT (2009) Complexity of tumor vasculature in clear cell renal cell carcinoma. Cancer 115: 2282–2289.
- 28. Iwakiri S, Nagai S, Katakura H, Takenaka K, Date H, et al. (2009) D2–40-positive lymphatic vessel density is a poor prognostic factor in squamous cell carcinoma of the lung. Ann Surg Oncol 16: 1678–1685.
- 29. Pages F, Kirilovsky A, Mlecnik B, Asslaber M, Tosolini M, et al. (2009) In situ cytotoxic and memory T cells predict outcome in patients with early-stage colorectal cancer. J Clin Oncol 27: 5944–5951.