The Molecular Subtype Classification Is a Determinant of Sentinel Node Positivity in Early Breast Carcinoma

Introduction Several authors have underscored a strong relation between the molecular subtypes and the axillary status of breast cancer patients. The aim of our work was to decipher the interaction between this classification and the probability of a positive sentinel node biopsy. Materials and Methods Our dataset consisted of a total number of 2654 early-stage breast cancer patients. Patients treated at first by conservative breast surgery plus sentinel node biopsies were selected. A multivariate logistic regression model was trained and validated. Interaction covariate between ER and HER2 markers was a forced input of this model. The performance of the multivariate model in the training and the two validation sets was analyzed in terms of discrimination and calibration. Probability of axillary metastasis was detailed for each molecular subtype. Results The interaction covariate between ER and HER2 status was a stronger predictor (p = 0.0031) of positive sentinel node biopsy than the ER status by itself (p = 0.016). A multivariate model to determine the probability of sentinel node positivity was defined with the following variables; tumour size, lympho-vascular invasion, molecular subtypes and age at diagnosis. This model showed similar results in terms of discrimination (AUC = 0.72/0.73/0.72) and calibration (HL p = 0.28/0.05/0.11) in the training and validation sets. The interaction between molecular subtypes, tumour size and sentinel nodes status was approximated. Discussion We showed that biologically-driven analyses are able to build new models with higher performance in terms of breast cancer axillary status prediction. The molecular subtype classification strongly interacts with the axillary and distant metastasis process.


Introduction
Gene expression profiling of invasive breast carcinoma has resulted in highlighting three main categories of breast cancer with very specific features [luminal-like, basal-like, HER2-like] [1]. Wirapati et al [2] showed that three main vectors-genes [ESR1, HER2 and STK6, a marker of proliferation] are the biological backbone of this classification. Although the methodology to determine the molecular subtypes has still to be improved [3], many publications have validated this classification [2] [4]. It has been shown that the molecular subtypes differ in their response to neaoadjuvant systemic treatment [5], loco-regional recurrence [6], metastasis pattern [7,8], time to metastasis and overall survival [3]. Furthermore, several authors have underscored a strong relation between the molecular subtypes classification and the axillary status of breast cancer patients [9][10][11][12][13][14][15][16]. As the nodal status is the most robust and the strongest factor correlated to overall survival in breast cancer patients, and is one of the major determinants in therapeutic decisions, axillary staging (either by sentinel node biopsy or axillary lymph node dissection) is a mandatory step in breast cancer management. Many predictors of axillary lymph nodes metastases have been previously published. Tumour size, tumour grade, tumour location, presence of lymphatic/vascular invasion, high MIB-1 index, age at diagnosis, S phase, estrogen receptor status (ER), progesteron receptor status (PR), HER2 status are independent variables identified in these studies [17][18][19][20][21][22][23][24][25].
The aim of our work was to decipher the relation between the molecular subtype classification as defined by a combination of ER and HER2 status evaluated by immuno-histochemistry (IHC) and confirmed by FISH in case of IHC-HER2 2+ and the probability of a positive sentinel node biopsy. Using one training set and two validation sets we showed a benefit to introduce the ER and HER2 biomarkers interaction covariate to identify, before surgery, a patient with a high risk of axillary metastasis. Furthermore we showed for each molecular subtype a very specific correlation pattern between the tumour size and the probability of a positive sentinel node biopsy. We hypothesized from these results that the axillary lymph node metastasis process is predominantly correlated to intrinsic biological properties in the ER negative HER2 negative breast cancer subgroup whereas stochastic events, tumour size,  growth rate and lympho-vascular invasion are the main determinants in the ER positive or HER2 positive breast cancer subgroups.  normal physical examination of the axilla, treated at first by conservative surgery plus a sentinel node (SN) biopsy. The procedure was performed with blue patent, radioisotope or a combination, as previously described, in line with French recommendations. SN biopsy was performed as previously described [26]. Axillary lymph node dissection was performed during the same procedure when the SN was positive by imprint cytology or frozen section. A second operation was performed when either hematoxylin-eosin staining or immunohistochemistry revealed tumor cells in the SN postoperatively, including isolated tumour cells. Pathologic SN examination methods were as reported previously [26]. Patients receiving a neoadjuvant treatment (chemotherapy, hormone-therapy or radiotherapy) or with a locoregional recurrence were systematically excluded from the study. The clinical data (age at diagnosis, treatment protocols) were extracted from the Institut Curie prospective breast cancer database and from the Hospital Tenon, department of gynecology, prospective breast cancer database.

Tumor samples
The following histological features were retrieved: tumour type, tumour size, histological grade according to Elston and Ellis grading system (Histopathology 1991), Mitotic Index, Lympho Vascular Invasion, Estrogen Receptor status, Progesterone Receptor status, HER2 status, number of positive sentinel nodes, number of sentinel nodes. Mitotic Index (MI) corresponded to the number of mitoses observed in 10 successive high power fields (HPF) using a microscope with a 40x /0.7 objectives and a 10x ocular, each. Mitotic Index was assessed on histological sections stained by Hematein, Eosin and Saffron. The criteria of Van Diest and al were used to define mitotic figures [27,28]. Estrogen Receptor (ER) and Progesteron Receptor (PR) immunostainings were determined as follow. After rehydration and antigenic retrieval in citrate buffer (10 mM, pH 6.1), the tissue sections were stained for estrogen receptor (clone 6F11, Novocastra, 1/ 200), and progesterone receptor (clone 1A6, Novocastra, 1/200). Revelation of staining was performed using the Vectastain Elite ABC peroxidase mouse IgG kit (Vector Burlingame, CA) and diaminobenzidine (Dako A/S, Glostrup, Denmark) as chromogen. Positive and negative controls were included in each slide run. Cases were considered positive for ER and PR according to standardized guidelines using $10% of positive nuclei per carcinomatous duct. The determination of HER2 over-expression status was determined according to the American Society of Clinical Oncology (ASCO) guidelines [29].
The SLN histopathological assessment protocol has been published by Fréneaux et al [26]. SLN samples were serially sectioned and stained with HE. Negative HE cases were then analyzed by serial sectioning with IHC. Positive sentinel nodes were classified into two groups according to the size of the metastasis: macrometastasis (.2 mm) and micrometastasis (, = 2 mm) detected either by HE staining or by cytokeratin IHC.

Statistical model
Baseline characteristics were compared between groups using Chi-square or Fisher's exact tests for categorical variables and Student's t-tests for continuous variables. To develop wellcalibrated and exportable nomograms for prediction of sentinel node positivity, we built a multivariate logistic regression model in a training cohort and validated it in two independent validation cohorts. First, univariate logistic regression analysis was performed to test the association of the sentinel lymph node status to the following variables: patient age, tumor diameter, histologic type of tumor, histological grade, lymphovascular invasion, ER status, PR status, HER2 status. Interaction covariate between ER and HER2 status were tested. The loglinearity of the continuous variables was study by fitting a polynomial functions with different degree or step functions in a logistic model. Age at diagnosis was subdivided in 3 classes and the tumour size was kept as a continuous variable. Second, a multivariate logistic regression analysis was performed to determine the probability of having a positive sentinel node biopsy procedure and to build a nomogram. Significant variables identified through univariate analysis were used as input in the multivariate analysis. The multivariate model performance was quantified with respect to discrimination and calibration. Discrimination (i.e., whether the relative ranking of individual predictions is in the correct order) was quantified with the area under the receiver operating characteristic curve. Calibration (i.e., agreement between observed outcome frequencies and predicted probabilities) was studied with graphical representations of the relationship between the observed outcome frequencies and the predicted probabilities (calibration curves): the grouped proportions versus mean predicted probability in groups defined by deciles and the logistic calibration were represented. The calibration was tested using the Hosmer-Lemeshow test. This test compares mean predicted probability and observed proportions using a 8 degree of freedom chi-square for the training set and a 9 degree of freedom chi-square for the validation sets. The analyses were performed using R software (http://cran.r-project.org).
A java web based interface is available at www.cancerdusein. curie.fr The study was approved by the breast cancer study group of the Institut Curie. Table 1 summarizes the training (1543 patients) and the two validation sets (615 and 496 patients). These three populations significantly differ in terms of age at diagnosis, ER status, HER2 status, histological grade, lympho vascular invasion, histological subtypes, number of sentinel nodes removed and number of positive sentinel node biopsy. These differences are of major interest in a validation process to test the robustness of a classification algorithm. The training set ( Table 2) was composed of 516 patients with a positive sentinel node biopsy (33.4%) and 1027 patients with a negative sentinel node biopsy (66.6%). We showed that patients with a positive sentinel node biopsy differed from those with a negative biopsy in terms of age at diagnosis, ER status, pathological tumor size, histological grade, mitotic index, lympho vascular invasion and number of sentinel node removed. The proportion of patients with a positive HER2 status was not significantly different between the two groups [8.6% vs 7.6%, p = 0.58]. The interaction covariate between ER and HER2 status [ERneg HER2neg, ERpos HER2neg, ERpos HER2pos, ERneg HER2pos] was a stronger predictor (p = 0.0031) of positive sentinel node biopsy than the ER status by itself (p = 0.016). We designed a multivariate logistic regression model to determine the probability of having a positive sentinel node biopsy ( Table 3). The initial input was based on the variables found significant in the univariate analysis. Tumour size, lympho-vascular invasion, molecular subtypes classification as defined by the interaction covariate between the ER and HER2 status and age at diagnosis were the final input into this model. Odds Ratio, Confidence Intervals and pvalue are summarized in table 3. The logistic regression parameters indicate the relative degree to which each of these variables is correlated to nodal metastasis. The performance of the multivariate model in the training and the two validation sets was analyzed in terms of discrimination and calibration.   probability of having a positive sentinel node biopsy procedure for each molecular subtype ( Figure 3, Table 4). We showed an almost null slope of the correlation axis in the ER negative HER2 negative subgroup. The probability of having an axillary metastasis seemed to be more or less 20% whatever the tumour size. Both ER positive (either HER2 negative or positive) tumour subgoups showed an intermediate slope and the ER negative HER2 positive tumour subgroup showed the steepest slope. Tumour size was a major determinant of axillary metastasis development only in the HER2 positive or ER positive tumour subgroups. Sentinel node biopsies for breast cancers of less than 30 mm was associated with a rate of less than 30% of axillary metastasis in the ER negative HER2 negative subgroup and with one higher than 50% in the other three subgroups. For each molecular subtype as defined by a combination of ER and HER2 immuno-histochemistry markers, we summarized (table 5) eight publications addressing the percentage of axillary metastases [9][10][11][12][13][14][15][16].

Discussion
The aim of our work was to decipher the relation between the molecular subtype classification as defined by a combination of ER and HER2 status and the probability of a positive sentinel node biopsy. Using one training set and two validation sets we showed a benefit to introduce the ER and HER2 biomarkers interaction covariate to identify, before surgery, a patient with a high risk of axillary metastasis. Using tumour size, lympho-vascular invasion, molecular subtypes classification and age at diagnosis, we designed a robust multivariate logistic regression model to determine the probability of having a positive sentinel node biopsy. We validated this model in two independent and very different datasets and showed a very similar performance in terms of calibration and discrimination. Lu et al identified a similar multivariate model to predict lymph node metastases that included tumour size, lympho vascular invasion and tumour subtypes defined by a combination of ER status, HER2 status and modified Bloom and Richardson grade [9]. Furthermore we identified for each molecular subtype a very specific correlation pattern between the tumour size and the probability of a positive sentinel node biopsy. The ER negative HER2 negative breast cancer subgroup nodal status was almost independent from the tumour size with a relative constant trend of axillary metastases around 20%. Conversely the ER positive or HER2 positive breast cancer subgroups showed a strong and almost linear correlation between the tumour size and the percentage of axillary metastasis.
Tumour size and lympho vascular invasion are the main predictors of axillary metastases identified in many studies [17][18][19][20][21][22][23][24][25]. However tumour size and lympho vascular invasion have never been robustly related to any pathological or biological marker. High throughput gene expression profiles analysis failed to identify a set of genes correlated to the nodal status, the tumour size or the lympho vascular invasion [9]. The gene expression profile of paired primary tumour and corresponding axillary metastases have previously been shown as very similar [30]. From these observations, conclusions have been drawn that growth rate, time and stochastic factors seem to be the main determinants of the nodal status. However, several authors have recently underscored a significant relation between the molecular subtypes classification and the axillary status of breast cancer patients [9][10][11][12][13][14][15][16]. These evidences sustained the idea that nodal status is still a potential signature of the intrinsic biological properties of a primary tumour. Perou et al have identified the molecular subtype classification in the late 909 and it was a major breakthrough in the breast cancer research process [1]. This classification underscored the great heterogeneity of breast cancer. It is now a common knowledge that the pathologic characteristics, the aCGH profiles, the gene and miRNA expression profiles and altered pathways are dramatically different between these categories and sustained an overview of breast cancer as a disease composed of very different and independent molecular subgroups.
For each molecular subtype as defined by a combination of ER and HER2 immuno-histochemistry markers, we summarized (table 5) eight publications addressing the percentage of axillary metastases [9][10][11][12][13][14][15][16]. As reported in our study the ER negative HER2 negative tumour subgroup has the lowest rate of axillary metastasis and the HER2 positive tumour one, the highest. We hypothesized from the whole results that the axillary lymph node metastasis process is predominantly related to intrinsic biological properties in the ER negative HER2 negative breast cancer subgroup when stochastic events, tumour size, growth rate and lympho vascular invasion are the main determinants in both the ER positive or the HER2 positive breast cancer subgroups. As the molecular subtypes differ in terms of relapse free survival and overall survival [ER negative HER2 negative and HER2 positive breast cancer patients experience a shorter relapse free survival and overall survival] and the nodal status is the strongest prognostic predictor, we highlighted a very complex interaction network between the primary tumour, the nodal status and the distant metastases. The molecular subtype classification is one determinant of this network.
Finally we showed that biologically-driven analyses are able to build new models with higher performance in terms of breast cancer axillary status prediction. The molecular subtype classification is the first stratification level of breast carcinoma and strongly interacts with the axillary and distant metastasis process. Large integrative analyses have to be performed to explain why ER negative HER negative tumours have a low rate of axillary metastasis and a high rate of distant metastases. Conversely HER2 positive tumours have a rate of axillary metastases strongly related to the tumour size and a high rate of distant metastases.