Impact of time to local recurrence on the occurrence of metastasis in breast cancer patients treated with neoadjuvant chemotherapy: A random forest survival approach

Background We studied the relationship between time to ipsilateral breast tumor recurrence (IBTR) and distant metastasis-free survival (DMFS) in patients with breast cancer treated by neoadjuvant chemotherapy (NAC). Methods Between 2002 and 2012, 1199 patients with primary breast cancer were treated with NAC. Clinical, radiological and pathological data were retrieved from medical records. Multivariate analysis was performed with the random survival forest (RSF) method, to evaluate the relationship between time to local recurrence and DMFS. Results Time to IBTR, local recurrence and molecular subtype were the factors most strongly associated with DMFS. In the total population, DMFS increased linearly with recurrence time, up to 50 months. For recurrences after 50 months, DMFS was similar for all times to recurrence. Considering molecular subtypes separately, the threshold was similar for the TNBC subtype (50 months), but appeared to occur later for the luminal and HER2-positive subtypes (75 months). Conclusion A threshold of 50 months seems to differentiate between early and late recurrences and could be used to guide the medical management of local breast tumour recurrences.

Introduction Neoadjuvant chemotherapy (NAC) is currently indicated as a means of allowing breast-conserving surgery in cases of breast carcinoma with a poor prognosis [1,2]. Studies comparing chemotherapy in adjuvant and neoadjuvant settings have reported similar prognoses for overall survival. However, consistent with the higher rates of conservative surgery in patients receiving NAC, local relapses are more frequent for NAC than for adjuvant chemotherapy, reaching 22% at 10 years [3][4][5][6].
Ipsilateral breast tumour recurrence (IBTR) is an independent risk factor for both distant metastasis [7][8][9] and death from breast cancer [10][11][12]. It is unclear whether this association is causal, whether IBTR is an indicator of active disease, or both. One of the major challenges in the management of local recurrences is distinguishing true recurrences, corresponding to the regrowth of resistant cells after initial treatment, from new primary tumours. This distinction is important for treatment, because true recurrences provide evidence of uncontrolled disease requiring radical treatment, rather than the secondary conservative surgery that could be considered for a new tumour. The risk of metastasis is also probably different for these two entities and should be considered as such for decisions concerning systemic treatment.
Time-to-recurrence seems to be a relevant surrogate for distinguishing "late recurrences" potentially corresponding to new primary tumours, from "early recurrences" more likely to correspond to progression of the initial disease. Some studies of adjuvant chemotherapy have reported better outcomes for patients with "late recurrences" than for those with "early recurrences" [13][14][15][16][17]. However, only two studies have investigated the impact of time to local recurrence on distant metastasis-free survival in the neoadjuvant setting [9,18]. One limitation common to all these studies is that the threshold for distinguishing between early and late recurrences is chosen by the author and often defined arbitrarily as the median time to recurrence, a time point that does not necessarily separate subgroups with good and poor prognoses. The accurate classification of IBTRs as "early" or "late"recurrences is essential in the neoadjuvant setting, to improve both prognostic evaluations and therapeutic choices.
Traditional statistical techniques, such as Cox's proportional hazards (PH) models, are generally used to identify potential risk factors. However, Cox models involve restrictive assumptions, such as a proportionality of hazards and linearity [19]. These assumptions may bias the analysis of prognosis in the long-term follow-up of breast cancer and hinder the identification of early or late markers of prognosis [20,21]. For this reason, Baulies et al. introduced a timedependent effect into their analysis, and showed that early local recurrence, within five years, in patients treated by conservative surgery, was a prognostic factor strongly associated with the development of distant metastases [18].
However, the relationships between clinical outcome and the predictors considered are potentially complex. It may be difficult to identify interactions, particularly those involving multiple variables, such as three-way interactions. This complexity would bias the relationship between IBTR, the time to local recurrence (time to IBTR) and distant metastasis-free survival (DMFS). These difficulties can be handled automatically by machine-learning methods, such as tree-based approaches. Random survival forests (RSFs) are a non-parametric tree-based ensemble learning method that can be used to select and rank variables [19,[22][23][24][25] without the limitations of Cox models. One of the key advantages of the RSF approach is the adaptive discovery of nonlinear effects and interactions. This approach uses all the available variables in the dataset to build the response predictor, without the need for explicit specification of the functional form of the covariates. Several studies have confirmed that RSF methods perform better in this context than the traditional Cox PH model [22,26,27].
We used the RSF method to study the relationship between time to IBTR, distant metastasis-free survival (DMFS) and overall survival (OS) survival in a large series of breast cancer patients treated with NAC.

Materials and methods
NEOREP ("Reponse à la chimiothérapie neoadjuvante")(Cohort, CNIL declaration number 1547270) is a retrospective cohort follow-up study of patients treated with NAC for a unifocal invasive breast carcinoma at Institut Curie (Paris, France) between 2002 and 2012. All experiments were performed retrospectively and in accordance with the French Bioethics Law 2004-800, the French National Institute of Cancer (INCa) Ethics Charter and after approval by the Institut Curie review board and ethics committee (Comité de Pilotage of the Groupe Sein). In the French legal context, our institutional review board waived the need for written informed consent from the participants. Moreover, women were informed of the research use of their tissues and did not declare any opposition for such researches. Data were analyzed anonymously. All cases were eligible for inclusion.
Clinical, radiological and pathological data, such as patient age, menopausal status, T stage, N stage, histological tumour grade, oestrogen receptor (ER), progesterone receptor (PR) and HER2 status, and pathological response to NAC, were recorded.
The pathological diagnosis was confirmed in all patients by a core needle biopsy before treatment. Histological grade was determined as described by Elston and Ellis (1991), with a modified version of the Scarff-Bloom-Richardson grading system. Hormone receptors were analysed by immunohistochemistry. Tumours were considered to be positive for ER or PR if 10% of the carcinomatous cells displayed positive staining, as recommended by European guidelines [28]. HER2 expression status was determined according to American Society of Clinical Oncology guidelines [29]. The molecular subtype of each tumour was determined as follows: "luminal" for breast tumours positive for ER and/or PR, with no overexpression of HER2; "HER2-positive" for tumours with HER2 overexpression; and "triple-negative" for tumours displaying no staining for ER or PR and without HER2 overexpression.
Patients were treated in accordance with national guidelines. Neoadjuvant chemotherapy regimens changed over time (anthracycline-based or sequential anthracycline-taxane regimens).
Surgery was performed four to six weeks after the end of chemotherapy. Patients underwent either mastectomy or breast-conserving surgery (lumpectomy), with axillary lymph node dissection (ALND) or sentinel node dissection (SLNB). SNLB, when indicated, was performed after NAC. In case of positive node in FNA, we performed ALND. A pathologic complete response (pCR) was defined as the absence of residual invasive cancer cells in the breast and axillary lymph nodes (ypT0is/ypN0).
After surgery, adjuvant treatment was administered in accordance with Institut Curie Treatment Guidelines. Adjuvant chemotherapy was administered at physician choice, according to the pathological response to NAC and lymph node status. The adjuvant chemotherapy chosen was with 5FU-Navelbine (FUN) in most of the cases. Trastuzumab was administered to patients with overexpression of HER2 since 2005. Patients received adjuvant radiotherapy according to national guidelines. Radiation was given in case of lumpectomy or in case of radical mastectomy for patient with initial T3 or T4 tumor, for all patients with involved axillary lymph nodes and to a selection of high-risk node-negative breast cancer patients.
Endocrine therapy (tamoxifen, aromatase inhibitor, or GnRH agonists) was added to the regimen as an adjuvant treatment, for almost all hormone receptor-positive tumours. After the completion of treatment, the patients were followed up every four months for the first two years, every six months for the next three years and then annually from the fifth year onwards. Clinical examination, mammography and breast ultrasound were performed annually.
Time to IBTR was defined as the time from diagnosis to local recurrence in the previously treated breast, and was measured from the date of diagnosis to the time of the last follow-up visit or IBTR. Distant metastasis-free survival (DMFS) was defined as the time from diagnosis to distant recurrence or to the last follow-up visit, whichever occurred first. Overall survival (OS) was defined as the time from diagnosis to death or to the last follow-up visit, whichever occurred first.
Patients for whom none of these events were recorded were censored at the date of last known contact.

Statistical method
As the traditional Cox model was not the most suitable for our question, multivariate analysis was performed with the random survival forest (RSF) method, a nonparametric approach to survival analysis [19]. A set of survival trees of similar size was first built by recursive partitioning on a training set obtained from the original data set by random aggregation through bootstrap sampling (with replacement) of the data (bagging). Each tree was tested on the remaining group (the validation set). At each node of the tree, we randomly selected a subset of predictors as candidate variables for splitting, making the forest robust to correlations between predictors. In each tree, survival time and patient status were treated as response variables. Each RSF run was performed on 3000 trees, with the log-rank splitting rule and five predictors randomly selected at each split. Missing data were treated by a multiple imputation strategy. We analysed 16 clinical and histological variables: age, body mass index (BMI), menopausal status, clinical T and N stage, surgery type (lumpectomy or mastectomy), histological subtype, initial tumour size, initial tumour grade, initial Ki67 status, molecular subtype, margin status, pCR, postoperative nodal involvement (ypN stage), local recurrence and time to IBTR.
The importance of each of the model covariates was determined by internal variable ranking measures: variable importance (VIMP) and minimal depth. VIMP is the difference in validation set prediction error before and after the permutation of variables: the larger the VIMP, the more predictive the variable. A VIMP close to zero indicates that the variable makes little or no contribution to predictive accuracy, and negative values indicate that predictive accuracy is improved by omission of the variable. Minimal depth indicates the impact of the variable on the prediction. The smaller the minimal depth, the more predictive the variable, as the variables with the smallest minimal depths split the largest samples of the population, frequently at nodes close to the root node.
VIMP and minimal depth may rank the variables differently. The variables selected had concordant values for VIMP and minimal depth, and a high predictive value.
Partial dependence plots were used to describe the adjusted predicted response to the covariate of interest, by integrating out the effects of variables other than the covariate of interest. Analyses were performed for the total population, and then for the three molecular subtypes separately.
The Three-year DMFS increased with time to recurrence (Fig 2A), reaching a plateau at a time to IBTR of 50 months. From this time to IBTR onwards, DMFS remained constant. A threshold of 50 months to IBTR was also appropriate for differentiating between early and late recurrence groups for five-year DMFS.
After adjustment for other variables, three-and five-year DMFS were lower in patients experiencing local recurrence, and in patients with TNBC, for longer times to recurrence ( Fig  2B). Patients free from local recurrence had a three-year DMFS of 87% and a five-year DMFS of 83%, whereas patients experiencing recurrence had a three-year DMFS of 77% and a fiveyear DMFS of 64%. Three-year DMFS was similar for patients with luminal tumours and patients with HER2-positive tumours (about 88%), whereas TNBC patients had a three-year DMFS of 83%. At five years, DMFS was highest for patients from HER2-positive subgroup (84%), whereas TNBC patients had the worst prognosis (DMFS: 78%). The axis on the left shows how DMFS decreases over time, and that on the right shows how shorter times to IBTR affect survival. Patients with time to IBTR exceeding 50 months had a DMFS greater than 80% at 60 months of follow-up.
We defined a cut-off of 50 months for differentiating between early and late recurrences. For early recurrences (i.e. occurring between 0 and 50 months after NAC), a greater time to IBTR was associated with a higher DMFS, for all follow-up times considered.
DFMS was similar for all late recurrences (defined as occurring more than 50 months after the primary tumour), regardless of follow-up time, as shown by the plateau on the figure.
We then performed the same analysis separately for the three molecular subtypes (Fig 4A  and 4B).
For all subtypes, three-year DMFS (Fig 4A) was lower for patients experiencing recurrence before 50 months than for those experiencing recurrence after 50 months. For five-year DMFS, the shape of the survival curve was similar to that for three-year DMFS for the TNBC subtype (cut-off at 50 months), whereas the cut-off point seemed to occur later for the luminal and the HER2-positive subtypes (after more than 75 months; Fig 4B). Table 1. Clinical, histological and follow-up characteristics of patients. Time to recurrence and metastasis in breast cancer

Discussion
Our results, obtained with a non-parametric approach, confirmed the prognostic importance of local recurrence, time to IBTR and molecular subtype for the occurrence of metastasis after NAC in breast cancer. Local recurrence and time to IBTR were the most important of the variables analysed for predicting patient outcome. Local recurrence has long been considered a poor prognosis factor per se [7][8][9][10][11][12]. Rouzier et al. found a relative risk of 5.34 (95% CI: 3.23-8.82) for the development of distant metastasis after local relapse in 257 patients treated by NAC and conservative surgery [9]. In this study, 59.7% of patients experiencing a local recurrence had developed distant metastases at five years. Time to recurrence has been shown to be an important prognostic factor in early breast cancer [8,9,18,[30][31][32], with patients with late recurrence having better outcomes than those with early recurrence [15]. However, the precise definition of "early" and "late" recurrence remains a matter of debate, with the threshold generally set at between two and five years [8,13,15,17,31,33]. An accurate definition of this threshold is essential, as it can be used to distinguish between true recurrences and new primary tumours, which should be managed differently. Chemotherapy is widely used for the treatment of recurrences. All international Time to recurrence and metastasis in breast cancer Time to recurrence and metastasis in breast cancer recommendations include the use of time to recurrence to guide treatment decisions. For local management practices, radical mastectomy could be reserved for cases of true recurrence (i.e. aggressive disease), whereas new primary tumours (with a non-aggressive profile) could be treated by secondary conservative surgery.
Baulies et al. showed, with a time-dependent variable, that local recurrence in patients treated by conservative surgery within the first five years was a prognostic factor strongly associated with the development of distant metastases (HR 4.21; 95% CI 2.89-6.11; P<0.001) [18]. Other studies using a three-year threshold to distinguish between early and late recurrences found that early recurrence was associated with a higher risk of distant metastases [9,15]. Fredrikson used a ROC curve to determine the best trade-off between specificity (66%) and sensitivity (62%) for local recurrence; they identified a cut-off point for the risk of death, at 2.3 years after surgery [33]. Gosset et al. identified 34 months as the cut-off for time to IBTR minimising the p-value for the relationship between time to IBTR and DMFS. They obtained different cut-offs for tumours with different HR statuses (HR-positive: 49 months, versus HRnegative: 33 months) [14].
Estimation of the prognostic effect of a time-dependent covariate with a standard Cox model is known to be potentially unsatisfactory [20], with the cut-off or the exact shape of the relationship between DMFS and time to IBTR being difficult to determine without specific models. The limitations of all these methods include the a priori definition of the shape of the relationship. A time-dependent covariate must be specified (Ln (t), 1/t, spline, polynome etc.) and a cut-off value provides a restrictive shape for the relationship. The random forest approach is non-parametric and does not require the explicit specification of the shape of the covariate response. This makes it possible to extract the optimal relationship between DMFS and time to IBTR.  Time to recurrence and metastasis in breast cancer This is the first study, to our knowledge, to have used a statistical non-parametric analysis method for the evaluation of prognostic factors in patients with breast cancer treated by NAC. Several studies have confirmed of the promise of RSF relative to Cox PH models for real datasets [22,26,27], and have reported a better performance for RSF than for Cox PH models on the basis of the prediction error criterion [34]. This approach can be used to identify risk factors for poor breast cancer survival. RSF deals automatically and coherently with the limitations of traditional Cox models, such as the assumption of proportionality [35], and does not require advance knowledge of the relationship (i.e. linear or nonlinear) of a variable to time [22].
Using this approach, we identified a time to recurrence of 50 months as the better threshold for differentiating between good and poor prognosis groups. Early local recurrence probably reflects greater biological aggressiveness and/or higher resistance to treatment. This was particularly true for TNBC, which is widely recognised as having a good prognosis provided that no early recurrence occurs in the first five years after diagnosis. It is already known that recurrence patterns differ between molecular subtypes, with earlier recurrences for TNBC than for luminal tumours [36-38], but this study provides additional evidence that the late recurrence of TNBC is associated with a good prognosis. The main limitation of this study was the use of a method often considered a "black box" method, without comprehensive values (such as the hazard ratios of the Cox PH model) for interpretation. We used graphical methods to assess the predicted dependence of the response on covariates, and such interpretations may be subjective. However, graphical presentations facilitate comprehension of the interrelationship between covariates. Another point is the indication of radiotherapy. The French guidelines are, similar to many other national, regional and local guidelines, more often advising for post mastectomy radiotherapy (PMRT) compared to other more restrictive guidelines. In fact, we advice PMRT for all patients with involved axillary lymph nodes and to a selection of high-risk node-negative breast cancer patients. Our patient cohort concerns patients who had an indication for primary systemic therapy. Thereby, they had higher risk factors at diagnosis and thereby an indication for PMRT. Our study is homogenous in that all patients received primary systemic therapy and PMRT. We can thereby not compare with other patients groups but that wasn't the intent of this study.
Our results suggest that time to IBTR should be seen as a key element for determining the patient's prognosis. Our findings require validation in further studies, to determine whether patients with late recurrence have a similar prognosis to those with primary tumours. If this proves to be the case, then such late recurrences should be considered equivalent to a new primary tumour in terms of prognosis, for decisions concerning systemic treatments.
Conversely, patients with an early local relapse should be considered to have aggressive and resistant disease with a high risk of distant metastasis. These patients should be invited to participate in clinical trials assessing new therapeutic strategies, as initial chemotherapy failed to eradicate the tumour cells.