Quantitative Measurement of Melanoma Spread in Sentinel Lymph Nodes and Survival

In this study, Klein and colleagues investigated the impact of minimal cancer sentinel lymph node spread and of increasing numbers of disseminated cancer cells on melanoma-specific survival. The authors found that cancer cell dissemination to the sentinel node is a quantitative risk factor for melanoma death and the best predictor of outcome was a model based on combined quantitative effects of DCCD, tumor thickness, and ulceration. Please see later in the article for the Editors' Summary


Introduction
For melanoma staging, sentinel node biopsy has been established to assess melanoma cell dissemination and has become the most widely used procedure to determine the regional lymph node status in patients with cutaneous melanoma [1,2]. Because an evidence-based lower threshold for clinically relevant melanoma spread could not be defined [3], the detection of isolated melanoma cells in sentinel nodes was included in the latest American Joint Committee on Cancer (AJCC) staging recommendations [4]. However, the prognostic value of small tumor deposits in sentinel nodes is not unanimously accepted [5][6][7]. One potential reason for this lack of confidence may be that, unlike measurements of primary tumor thickness, highly sensitive methods for precise and direct quantification of sentinel node involvement by histopathology, optimal sample preparation, and screening are still lacking. The chance of detecting rare tumor cells by histopathology depends on the number of sections screened [8], and extensive histopathological protocols [7,[9][10][11] can achieve a detection rate of 30% but require the analysis of 24 to 36 slides per node [7]. Since this translates into 42 h of examination time for about ten melanoma patients a week [12], it is obviously impracticable for many institutions. Because of a lack of standardization [7,10,11,[13][14][15][16], the accuracy of sentinel node analysis has been limited for the mentioned practical reasons.
We previously developed a quantitative immunocytological assay to identify early cancer spread in sentinel nodes [17]. In this assay, the sentinel node is disaggregated, and disseminated cancer cells (DCCs) are detected by immunostaining for gp100, an antigen involved in melanin synthesis, using the HMB45 antibody [17]. The number of DCCs per 10 6 isolated cells defines the DCC density (DCCD). In the current study we applied this assay to a prospective cohort of 1,027 patients. The aim was to evaluate its predictive value as a quantitative variable in comparison to qualitative routine histopathology and to address the role of minimal tumor seeding in the survival of melanoma patients.

Ethics Statement
Our study complied with the guidelines of the Declaration of Helsinki. As such, the institutional review boards of the University of Tübingen (ethics vote number 5/99) and the University of Regensburg (ethics vote number 07/79) approved the study. All patients provided written informed consent to examination of their sentinel nodes by disaggregation immunocytology, to the recording of their follow-up data in the Central Malignant Melanoma Registry of the German Dermatological Society, and to the molecular characterization of the isolated cells. Control (nonmelanoma) nodes were obtained from patients with chronic venous insufficiency in whom a lymph node was removed during crossectomy, or from non-melanoma skin cancer patients. Sample acquisition was in agreement with the rules of the Ethics Committee of the University of Tübingen for use of waste tissue. No personal data were recorded from the control patients.

Patients
From February 8, 2000, to June 16, 2008, we enrolled 1,154 patients who underwent lymphatic mapping and sentinel node biopsy at the University Hospital Tübingen, Germany, for histopathologically proven first invasive primary cutaneous melanoma. At the Department of Dermatology, University Hospital Tübingen, sentinel node biopsy is generally recommended for patients with melanoma lesions with a Breslow's tumor thickness of $1.0 mm, with primary tumors of Clark level IV or V, or with tumors of any Breslow's thickness but showing regression or ulceration. Twelve patients without other risk factors requested sentinel node biopsy although their melanomas were thinner than 1 mm. The preoperative staging to exclude metastatic disease consisted of a physical examination, ultrasound examination of regional lymph nodes and the abdomen, chest X-ray, and computed tomography brain scans. For final analysis 119 patients were excluded as they had a follow-up of less than 3 mo, and eight patients were excluded because of missing information about primary tumor thickness. The remaining 1,027 patients (Table 1) included 322 whose DCCD results have been reported in an interim analysis [17], however without follow-up information. Skin draining control lymph nodes (n = 58) were obtained from 52 nonmelanoma patients (52 skin draining nodes from nonmalignant conditions, six sentinel nodes from non-melanoma skin cancer patients) and disaggregated, stained, and evaluated identically to the lymph nodes of melanoma patients.

Lymphatic Mapping, Sentinel Node Biopsy, and Tumor Cell Detection
Cutaneous lymphoscintigraphy, sentinel node biopsy, and sample preparation were performed as previously described [17] with minor modifications. From the beginning of the study until 31 July 2003, the procedure was as follows. The lymph node was cut along its longitudinal axis for histopathological and immunocytological examination. One half of the sentinel node was fixed in 3.5% formaldehyde, paraffin-embedded, and subjected to standard histopathological treatment, which included hematoxylin and eosin staining and immunohistochemistry on three 4-mm paraffin sections from the central level. From 1 August 2003 until the end of the study, the procedure was as follows. The lymph nodes were cut perpendicularly to the long axis [15]. For histopathology, 2-mm slices were cut after formalin fixation of the tissue. Hematoxylin and eosin staining and immunostaining (using antibodies directed against S100, HMB45, and Melan-A) of sections from each level was performed as described above. Thus, the total number of sections examined per node varied according to the size of the node. However, in all cases at least two levels (with four 4-mm sections each) were examined. The complete histopathological workup of the lymph nodes was done at the Department of Pathology, University of Tübingen, without knowledge of the immunocytological gp100 result. A patient was documented as histopathologically positive if at least one node was considered positive by the histopathological examination. The patients with isolated tumor cells were considered histopathologically positive.
Quantitative immunocytology was performed immediately after sentinel node biopsy at the Department of Dermatology, University of Tübingen, using the other unfixed half of the lymph node [17]. The lymphatic tissue was cut into 1-mm pieces and disaggregated mechanically into a single-cell suspension by rotating knifes (DAKO Medimachine, DAKO), washed with HBSS (Life Technologies), and centrifuged on a density gradient made of a 60% Percoll solution (Amersham). Cells were counted using a Neubauer counting chamber. Per slide, 10 6 cells from the interphase were then dispensed onto adhesion slides (Menzel) in a volume of 1 ml of PBS. After sedimentation for 1 h, the slides were air-dried overnight. Immunocytological staining was carried out with the alkaline phosphatase/anti-alkaline phosphatase method using primary antibodies against gp100 (HMB45, DAKO) and Melan-A (A103, DAKO), and 5-bromo-4-chloro-3-indolyl phosphate/ NBT (DAKO) as substrate, yielding a blue reaction product. A lymph node was defined as gp100 positive or Melan-A positive if it contained at least one gp100-positive or one Melan-A-positive cell, respectively. The number of positive cells per million isolated cells was recorded after screening of the slides by a technical assistant and final evaluation by a dermatologist, both experienced in evaluation of cytological preparations. The recording was done without knowledge of the histopathological findings or other clinical data. Positive preparations were air-dried or stored for a maximum of 4 d in PBS until cell isolation for genomic analyses.

Single-Cell Comparative Genomic Hybridization
Single-cell comparative genomic hybridization (CGH) was performed as previously described [18,19]. In brief, proteinase K was used to digest cellular proteins after isolation, the single-cell genome was digested using MseI, adaptors were ligated to the 59 overhangs, and the DNA fragments were amplified by PCR, resulting in an MseI representation of a single-cell genome. The reagents and protocol are now commercially available as kit (Ampli1, Silicon Biosystems). These amplicons were labeled and hybridized onto metaphase spreads or an Agilent 180 K microarray for array CGH [35]. Histograms for the CGH data were generated using the online algorithms at http://progenetix. net [20]. Twelve of the 46 patients from whom we isolated gp100positive cells for the had samples with a median DCCD of 2 gp100-positive cells per million isolated cells (range 1 to 7) and were included solely to investigate the genomes of the early DCCs.

Statistical Analyses
Melanoma-specific survival rates were calculated from the date of sentinel node biopsy until death from melanoma or the last followup. The 5-y survival percentages are derived from the Kaplan-Meier survival estimates, F(t). The 95% confidence intervals were based on the log(2log F(t)) transformation as described by Kalbfleisch and Prentice [21]. We calculated Pearson's correlation coefficient of log(DCCD + 1) and log(Melan-A + 1) for assessing the association of DCCD and Melan-A. For the comparison of positive DCCD values among subgroups of the other six prognostic variables we used either two-sample t-tests or one-way ANOVA after logarithmic transformation, i.e. we compared geometric means.
We used dot plots together with quartiles to show the differences in the distribution of DCCD values among the groups defined by the other variables.
We used univariable Cox regression models for the following seven predictors: gender, age, Breslow's thickness, ulceration, localization, nodal status pathology, and DCCD. A log transformation was used for the variable tumor thickness, and the logarithm of DCCD + 1 was used for the variable DCCD; hazard ratios are reported together with their 95% confidence intervals. p-Values are given for the likelihood ratio tests of the Cox models. In addition, we calculated hazard ratios after grouping of the three continuous variables age, thickness, and DCCD. We plotted the hazard ratios of the models with the continuous and the grouped data for DCCD and tumor thickness in order to verify a linear model on a logarithmic scale.
For multivariable Cox regression analyses we adopted the model selection criterion according to Schwarz [22] and used the minimal value of the Bayes information criterion (BIC) to select the optimal model. This quantity is the sum of twice the negative log likelihood of the model plus the number of parameters times the logarithm of the sample size. Instead of evaluating the likelihood for all 128 ( = 2 7 ) possible models we started with the model that included all seven variables, and then successively deleted the variable with the highest p-value. This approach finds the model with the lowest BIC value for a given number of variables, which was verified by calculating the BIC value for all 128 models.
For internal validation we used 100 bootstrap samples and calculated Harrell's c-index with and without correction for optimism [23]. We calculated Harrell's c-index instead of Somer's D because Harrell's c-index (c = (D+1)/2) estimates the proportion of concordant pairs among all comparable pairs of patients. We proceeded as follows.
Step 1: we determined c app from our model as selected using the BIC criterion.
Step 2: we generated 100 bootstrap samples from the original dataset by sampling with replacement.
Step 3: for each of these 100 bootstrap samples the same model selection procedure as for the original dataset was applied.
Step 4: for each of the 100 bootstrap samples we calculated the c-index c boot . Step 5: the 14 different models found in step 3 were applied to the original dataset, and the corresponding c-indices c orig determined.
Step 6: the average optimism of the fit was calculated as c orig 2c boot .
Step 7: the bootstrap-corrected performance of the original stepwise model was calculated as c app 2(c orig 2c boot ).
In order to verify the proportional hazards assumption of the Cox model we divided the patients into two groups for each of the three final predictors and plotted the ratio of their cumulative hazard functions as a function of time. According to the Cox model this ratio should stay constant. As an alternative to the Cox model we used the lognormal distribution as a model for predicting outcome by DCCD, thickness, and ulceration. The lognormal model allows determination of the maximum timedependent hazard rate and the time at which it occurs (formulas for the lognormal survival probability are in Text S1). We used the lognormal model to determine the 5-y survival probabilities by a nomogram [24].

Model Validation and Comparison
To assess the goodness of fit of the models (Table S3), we divided the 1,027 patients into 18 subgroups according to the following criteria: three groups of DCCD values, with DCCD = 0 assigned the value 0, 0, DCCD ,100 assigned the value 1, and DCCD $100 assigned the value 2; three groups of tumor thickness, with tumor thickness #2 mm assigned the value 1, 2 mm , tumor thickness #4 mm assigned the value 2, and tumor thickness .4 mm assigned the value 3; and ulceration no/yes. We restricted the analysis to 18 subgroups because otherwise the number of patients per subgroup would be too small. Only a slight difference in the hazard ratio was observed between the categories 0, DCCD #3 and 3, DCCD ,100, as shown in Table 2. These categories were therefore combined. For each of the 1,027 patients, we calculated the expected failure probability for the individual follow-up times. The observed numbers of deaths in the 18 subgroups were compared to the expected numbers of deaths by the chi-square statistics. The goodness of fit of the models was compared by the sum of the 18 chi-square values, taking into account the number of degrees of freedom, which depends on the number of estimated parameters. Since the models were not nested we did not perform likelihood ratio tests. We provide the chi-square statistics only for descriptive purposes. The grouping of patients according to AJCC criteria was based on the AJCC 2009 recommendations (which include assignment of isolated tumor cells as nodal positive) [4], with the exception that mitotic rate could not be included because it had not been assessed at the beginning of the study. For the AJCC grouping, the nodal status was determined by histopathology and not by immunocytology.
For the parametric model with the variables thickness, DCCD, and ulceration, each individual patient was characterized by his or her risk score. This score was a linear combination of the logarithms of tumor thickness and DCCD and of ulceration. The purpose was to assess the goodness of fit of our model in four welldefined groups of patients. Since the precision of Kaplan-Meier estimates depends essentially on the number of deaths in a sample, we wanted to achieve similar precision in all four groups. To this end the 138 patients who died from melanoma were divided into four groups with increasing risk scores of death, as defined by the survival model. Subsequently, the risk thresholds of the four groups were applied to all the patients. For given values of the three variables included in the predictive models each patient can uniquely be assigned to one of the four groups. All four groups differed from each other significantly (all p-values ,0.001 in the log-rank test comparing group 2 to group 1, group 3 to group 2, and group 4 to group 3).
We next compared the goodness of fit of the parametric model with the variables thickness, DCCD, and ulceration with a Cox model using the same variables. We found that the parametric model performed better (p = 0.13, sum of x 2 values = 21.1) than the corresponding Cox proportional hazards model (p = 0.03, sum of x 2 values = 27.4) after grouping all 1,027 patients into the 18 risk groups (see above and Table S3).
To assess whether a model that included the information on Melan-A staining in addition to gp100 staining could further improve outcome prediction we compared our model based on DCCD (the maximum number of gp100-positive cells per million isolated cells per patient), thickness, and ulceration with the model DCCD 2 (defined as the maximum number of gp100-or Melan-A-positive cells per million isolated cells), thickness, and ulceration.
We found that the model that included the information on Melan-A staining was not superior to gp100-based DCCD reporting (p = 0.09, sum of x 2 values = 22.7; Table S3). Finally, we compared the predictions based on our parametric model and on AJCC staging [4]. The goodness of fit for our model (p = 0.13, sum of x 2 values = 21.1) was much better than that of the AJCC staging model (p,0.0002, sum of x 2 values = 36.7). For details and data, see the Table S3.
For comparison with the AJCC model, the patients were originally divided into three groups according to the differences in survival predictions between the two models. For individuals in Group S1 of Table S2, the survival probability for the new model was at all times greater than the survival probability according to the AJCC model; the absolute percentage difference was greater than 13%. In Group S2 the survival probability for the new model was at all times smaller than the survival probability according to the AJCC model; the maximum absolute percentage difference was greater than 13%. For Groups S1 and S2, the maximum absolute percentage difference of 13% was chosen because it exceeds the maximum absolute percentage differences for those patients for whom the survival in one model was not always greater than the survival in the other model. The remaining patients formed Group S3.
The risk scores were the sums of the products of the individual predictors multiplied by the corresponding regression coefficients. Since some tumor thicknesses were less than 1 mm, we also obtained negative risk scores because tumor thickness was on a log scale.
We performed statistical analyses with JMP (version 10.0.2).

Patients
The final analysis included a total of 1,834 sentinel nodes from 1,027 patients examined by quantitative immunocytology and by histopathology. The baseline characteristics of all 1,027 patients are summarized in Table 1. The median follow-up was 49 mo (range 3 to 123 mo), with 370 (36%) patients having a follow-up of at least 5 y. During follow-up, 138 of 1,027 patients (13.4%) died from melanoma. The 5-y melanoma-specific survival probability for all patients was 86% (95% CI 83%-88%).

Detection of Disseminated Melanoma Cells by Immunocytology
The underlying rationale of our detection assay ( Figure 1A-1E) comprises two aspects. First, the spatially inhomogeneous distribution of melanoma cells may be equalized by lymph node disaggregation and generation of a single-cell suspension, which facilitates melanoma cell detection when only parts of the sample are screened; second, the number of melanoma cells can be counted and the amount of analyzed tissue can be quantified by referencing the number of melanoma cells to a defined number of isolated lymph node cells. This allows determining the DCCD, i.e. the number of DCCs per 1 million isolated cells.
We first assessed whether gp100-positive cells could be detected in skin draining nodes from non-melanoma patients. These lymph nodes were prepared identically to the sentinel nodes from melanoma patients ( Figure 1B-1D), except that lymph nodes from cancer patients had to be split in half to provide tissue for routine histopathology. We could not detect a single gp100-positive cell among 171610 6 cells isolated from 58 non-melanoma skin draining lymph nodes. On the other hand, we detected gp100-positive cells in the lymph nodes of 525 of the 1,027 melanoma patients (51%). Whenever enough cells were isolated from the lymph node half for immunocytology, we aimed to screen 2610 6 lymph node cells, i.e., two slides, per patient. The median number of slides screened per node was two (range 10 4 to 6610 6 cells). The median DCCD in patients with DCCD .0 was 4 gp100-positive cells per million isolated cells (ranging from 0.2 to 950000; Figure 1F). We evaluated the relation of DCCD with the six established prognostic factors ( Figure 2). Geometric mean values of DCCD were significantly higher in thicker and ulcerated melanomas (p, 0.001), in melanomas located at other sites than the extremities (p = 0.02), and in patients with a pathologically positive sentinel node (p,0.001; Figure 2).
However, severe concerns about the immunocytological assay may be raised by (1) the loss of architectural information, which helps to differentiate between intra-lymphatic nevi and colonies of melanoma cells, (2) the difficulty to identify melanoma colonies by morphological criteria, and (3) the fact that the gp100 antigen for melanoma detection may be down-regulated. We addressed these concerns by careful evaluation of lymph node preparations from melanoma and non-melanoma patients using a second melanomaassociated antibody directed against Melan-A and by genetic analysis of the gp100-positive cells.
We found Melan-A-expressing cells in three out of 38 (8%) control lymph nodes, all of which were gp100 negative. We then proceeded to determine the detection rate of the two antibodies for melanoma cells in sentinel nodes from melanoma patients with histopathologically proven lymphatic spread by applying a double staining method. Because Melan-A-positive cells were detected in 8% of control nodes, we restricted the direct comparison of gp100and Melan-A-positive cells to samples from histopathologically  These data suggest that the slightly higher detection rate of the Melan-A antibody does not outweigh the lower specificity as determined by the control samples. However, to completely rule out that gp100-negative DCCs comprise a relevant confounding factor, additional slides were stained using the Melan-A antibody in 710 patients. Comparing the gp100 and Melan-A staining results of these 710 patients, we confirmed the high correlation of gp100 and Melan-A staining (r = 0.83, p,0.001) that we had previously seen by double immunofluorescence. As detailed in Table S3, we found that the inclusion of Melan-A did not improve the prognostic power of the gp100-based immunocytological assay.

Genetic Characteristics of Disseminated Melanoma Cells
Since all these findings provided indirect support that gp100positive cells represent DCCs, we searched for direct evidence of their malignant origin. We randomly isolated 65 gp100-positive cells from 46 patients for a whole-genome screen of chromosomal aberrations by CGH. The DCCD values of these patients ranged from 0.2 to 800,000 gp100-positive cells per million isolated cells (median = 8), and we analyzed between one and three cells per patient. Metaphase CGH provided direct proof for the malignant origin of 57 gp100-positive cells ( Figure 3A), while eight cells displayed normal karyotypes. As metaphase CGH has a resolution of 10-20 Mb, we subsequently applied array CGH [35], which has a resolution of ,1 Mb, to these eight cells. While we could not detect any aberration in two cells, the remaining displayed between one and ten changes (median = 4.5) ranging from 0.1 to 19 Mb (median = 2 Mb). In summary, 63 of 65 gp100-positive cells (97%) displayed genomic aberrations, which classified 45 of 46 patients (98%) as harboring cancer cells in their sentinel nodes. There was no difference for cells isolated from lymph nodes classified as negative or positive by routine histopathology, demonstrating that our assay is suited to correctly identifying melanoma cells without morphological assessment of tissue architecture ( Figure 3B).

Disseminated Cancer Cell Density and Melanoma-Specific Survival
We evaluated DCCD as biomarker according to the REMARK criteria [25]. Of the standard prognostic factors, sentinel node histopathology (p,0.001), age (p,0.001), thickness (p,0.001), ulceration (p,0.001), and localization of the primary melanoma (p = 0.04) were associated with poor outcome in the univariable Cox regression analyses (see Table 2 and Figure S2 for Kaplan-Meier estimates). Increasing DCCD values were negatively associated with the time to death from melanoma in the univariable Cox regression analyses (p,0.001). We assessed the prognostic impact of DCCD after categorizing the values into four groups ( Table 2). We found that even the detection of low DCCD values (0,DCCD #3) conferred a significant risk of death (hazard ratio 1.63, 95% CI 1.02-2.58, p = 0.04; Table 2 and Figure 4A) compared to patients without DCCs. Increasing hazard ratios were obtained for categories with higher DCCD values ( Table 2). The relationship of increasing DCCD values and the hazard ratio is plotted in Figure 4B on the logarithmic scale. The unit risk ratio (corresponding to a 10-fold increase of DCCD + 1, e.g., from a DCCD of zero to a DCCD of nine) was 1.81 (95% CI 1.61-2.01), and a linear relation (on log scale) between DCCD and hazard ratio was identified ( Figure 4B). A similar log-linear relationship was seen between tumor thickness and hazard ratio ( Figure 4C).
We next performed stepwise multivariable Cox regression analysis starting with all six standard prognostic factors in addition to DCCD. After each step of the multivariable analysis the variable with the highest p-value was deleted (Tables 3 and 4). To identify the optimal model, we determined the BIC, which has a minimal value for the best model [22]. As can be seen from Table 4, the BIC value is lowest for the combined variables tumor thickness, DCCD, and ulceration, for which all p-values were below 0.001. The unit hazard ratios for this model were 6.96 (95% CI 3.61-13.28) for thickness, 1.43 (95% CI 1.27-1.61) for DCCD, and 2.04 (1.4-2.97) for ulceration. It should be noted that nodal status determined by routine histopathology had a maximum hazard ratio of 1.75 (95% CI 1.04-2.86) in multivariable analyses and was rejected already in step 3 (Table 3).

Individual Risk Prediction by Tumor Thickness, Disseminated Cancer Cell Density, and Ulceration
To fully exploit the power of our quantitative assay, we combined the three most important risk factors identified by multivariable analysis (tumor thickness, DCCD, and ulceration) for individual risk assessment at diagnosis and during follow-up. While results of Cox models represent a useful summary for the average hazard ratios, we observed that the assumption of proportional hazards was not fulfilled for tumor thickness and DCCD ( Figure S3). Therefore, we employed a lognormal survival model based on tumor thickness, DCCD, and ulceration that allows the calculation of changes in individual risk over time and of the predicted 5-y survival for all 1,027 patients ( Figure 5A). Figure 5A shows that patients with thin melanomas never harbored high DCC numbers in their lymph nodes and poorest outcome was seen for thick tumors and high DCCD. It should be noted that DCCD and tumor thickness are plotted on a logarithmic scale, and therefore the curves of equal 5-y survival probability appear as straight lines. On a linear scale these curves (isoboles) are convex ( Figure S4), which indicates synergism [26]. Using this model, we calculated the time-dependent hazard rates for 14 individual patients with five different hazard rate curves ( Figure 5B). This calculation revealed that DCC-negative and DCC-positive patients might display identical hazard rate functions (e.g., compare Patients 2a and 2b in Figure 5B) and also that the hazard rate peaks later in low-risk than in high-risk patients. Furthermore, tumor thickness, DCCD, and ulceration state can be integrated into a preliminary nomogram to determine the 5-y survival of individual patients ( Figure 5C).  Table 2 together with their 95% confidence intervals as a function of their median DCCD values on a log scale. The red line provides the hazard ratios for the linear model where log(DCCD + 1) is entered as a continuous variable. The predicted curve is within the confidence intervals. The unit risk ratio 1.81 is the hazard ratio corresponding to a 10-fold increase of DCCD + 1. (C) Relation of Breslow's thickness and risk of death from melanoma. Hazard ratios (in black) are given for the three groups of Table 2 (melanomas with 2 mm # tumor thickness #4 mm; 4, tumor thickness #8 mm; tumor thickness .8 mm) together with their 95% confidence intervals as a function of their median thickness values on a log scale. The red line shows the predicted hazard ratios for the model where log(thickness) enters as a continuous variable. We show only the values $1.3 mm, i.e., the median value of the reference group in Table 2. The predicted line is within the confidence intervals. The unit hazard ratio 18.6 corresponds to a 10-fold increase of thickness. doi:10.1371/journal.pmed.1001604.g004

Internal Validation of the Model
While the present survival model based on the predictors thickness, DCCD, and ulceration awaits validation by an independent multi-center study, we sought to validate it internally. For this we applied a bootstrapping approach [23]. We generated 100 bootstrap samples from the original dataset by sampling with replacement. For each of these 100 bootstrap samples the same model selection procedure as for the original dataset was applied. We obtained 14 different ''best models'' ( Table 5). The present model was selected most often. The variable DCCD was included in 85 best models, whereas nodal status by routine histopathology was included in only 36 of the 100 models. Harrell's c-index, which estimates the probability of concordance between predicted and observed responses, for the present model was 0.763 in the original dataset. Harrell's c-index based on the current AJCC staging system was 0.737. This is significantly smaller (p,0.0001; McNemar's test). The bootstrap-corrected c-index [23] for the present model turned out to be 0.748, which is well above the value of 0.5 representing only random prediction ability.
We then analyzed those patients for whom the predictions of the AJCC and the new model differed (survival probability in group S1: new model . AJCC; in group S2: new model , AJCC; group S3, remaining patients). We calculated the expected number of deaths at the observed follow-up time for each patient and compared this number with the observed number of deaths (Table  S2). Only the new model provided an acceptable fit for all three groups. In Group S2 of Table S2 the number of deaths predicted in the AJCC model was significantly different from the number of observed deaths (p,0.0001).
Finally, we combined Groups S1 and S3 from Table S2 to form two groups. Group 1 now comprised patients for whom the novel model predicted better survival than the AJCC model and patients for whom the predictions of both models concurred. Patients for whom the novel model predicted a worse survival than the AJCC model formed Group 2.
For the time points 3 and 6 y after sentinel lymph node biopsy we determined which model provided a better fit for the survival of patients in Group 1 and 2. Kaplan-Meier plots demonstrated that   Figure 6C). For both time points 3 and 6 y (p,0.001 and p,0.01, respectively) the AJCC model significantly deviated from Kaplan-Meier estimates for Group 2 patients, whereas our model correctly predicted the number of deaths (Table 6). For the low-risk patients of Group 1, both models provided acceptable fits, although we noted a borderline p-value (p = 0.06) for the new model at 3 y. However, the fit for predicted and observed survival becomes excellent for the new model over time ( Figure 6C)-in line with the need for longer observation periods in low-risk patients. Thus, at 6 y follow-up there is perfect agreement for the new model, whereas the AJCC model overestimates the number of deaths ( Figure 6C and Table 6).
We assessed the goodness of fit of the model after grouping the patients according to their risk scores, which was a linear combination of the logarithms of tumor thickness and DCCD and of ulceration, into four groups (see Methods). We compared predicted and observed survival curves and found that Kaplan-Meier curves and predicted curves were superimposable over the complete range of disease courses for all four risk groups and that all four groups differed significantly from each other (Figure 7). Finally, we compared the goodness of fit for several models (a model based on Cox regression analysis, a model that includes data on Melan-A staining, and a model based on the current AJCC criteria). In summary, we found that the parametric model based on thickness, DCCD, and ulceration most accurately predicted melanoma death (Table S3).

Discussion
In this study, we quantified the number of DCCs per one million isolated lymph node cells (DCCD) and assessed its utility in predicting melanoma outcome. Based on a median follow-up of 49 mo, with 370 patients having follow-up times of more than 5 y, we found that at the time of sentinel node biopsy, quantitative assessment of DCCD predicted melanoma outcome by univariable and multivariable analysis in a large cohort of patients. Furthermore, quantitative DCCD showed a stronger association with outcome than qualitative conventional histopathology and, when combined with primary melanoma thickness and ulceration, had a synergistic impact on patient survival. Using these variables we developed a parametric model that proved to be the most accurate for predicting outcome. Although we currently lack an external validation cohort with long follow-up, the accepted prognostic role of sentinel lymph node spread [4] and the successful internal validation (bootstrap and goodness of fit) give credibility to the findings.
We found that even the detection of three or fewer DCCs per million leukocytes in the sentinel node increases the risk of death at 5 y from melanoma by 6% (8% for DCCD = 0 versus 14% for 0, DCCD #3). This finding is in line with reports supporting the clinical relevance of single DCCs [5,27] and the novel AJCC recommendation [4] to refrain from using a lower threshold for sentinel node spread. However, we also identified three shortcomings of the AJCC categorization approach. First, AJCC staging does not differentiate between isolated cancer cells and small and large microscopic metastases. Our data demonstrate that the number of cells matters over the full range of DCCD. Second, because any measured DCCD value can be translated into a 5-y survival rate, the typical exaggerations of categorizing staging systems, such as upgrading from stage II to stage III because of the detection of a single melanoma cell, are also avoided, and individual disease courses can be accommodated better than by categorizing tumors based on the AJCC staging system. Third, we provide clinical evidence for the context dependency of the metastasis-forming potential of DCCs, which emerges from our observation that DCCD, tumor thickness, and ulceration-being the leading prognostic factors from the multivariable analysis-can be combined in a parametric survival model where the prognostic value of a single DCC differs for thick and thin tumors. For example, we observed DCCs in 46% of T1 stage melanomas; however, 5-y survival rates are more than 90% in this subgroup of patients, indicating that under most conditions DCCs do not result in clinically relevant metastasis. This may suggest that cellular programs such as senescence or dormancy are activated at initial homing to distant sites [28,29] but may be released once primary tumors grow large. Such a scenario has gained credibility since secreted factors of primary melanomas, such as exosomes, were shown to evoke substantial systemic effects [30] promoting metastasis.
In addition to DCCD and thickness of the primary melanoma, ulceration status had an impact on survival in melanoma. The biological interdependencies between destructive growth (ulceration), tumor-mass-induced systemic alterations (tumor thickness), and metastatic dissemination (DCCD) for progression of an individual melanoma may explain why histopathologically nodepositive patients can have a better outcome than histopathologically node-negative patients if the primary melanoma has more favorable prognostic features. This phenomenon is not reflected within a categorizing staging system. For example, the current AJCC staging predicts a 5-y survival of 53% for stage IIC (T4bN0M0) and 70% for stage IIIa (T1-4N1aM0). In contrast, the estimated survival of a patient with tumor thickness 7.4 mm and DCCD = 0 but without ulceration is identical to the estimated survival of a patient with ulceration and tumor thickness 3.35 mm and DCCD = 2, or with tumor thickness 1.2 mm and DCCD = 772 in our model.
Since summary measures of survival may provide insufficient information about population dispersion, we asked whether the new model reflects the prognostic heterogeneity of patients more accurately. Indeed, we identified a group of patients at high risk for progression in whom the AJCC model underestimates the risk of death. Although this group of patients is relatively small (13% in our cohort), these patients will most likely benefit from adjuvant therapy, and the model may help to improve patient stratification for clinical trials. It also identified a group of very low risk patients who have an excellent long-term outcome and whose risk of dying is overestimated by the AJCC staging model.
We carefully evaluated the performance of our assay. As lymph node disaggregation destroys the tissue architecture, some morphological criteria to identify melanoma cells are lost. However, we deem it unlikely that benign nevus cells in sentinel nodes, described in up to 28% of melanoma patients [9], confound our conclusions. These cells rarely express gp100 [31], and nomogram enables estimating the 5-y survival probability. The 5-y survival probability is the value at the intersection of a straight line connecting DCCD and tumor thickness with the central straight line. For illustration, the blue lines exemplify the same Patients 1 and 5 that are marked in (A) and described in (B). Patient 5 had an ulcerated melanoma, and therefore the red scale must be used, while Patient 1 had a non-ulcerated melanoma, so survival probability is given on the black scale. Note that the nomogram has not been independently validated and therefore is not yet suitable for clinical use. doi:10.1371/journal.pmed.1001604.g005 Table 5. Variables included in the 14 best models found in the 100 bootstrap samples created for internal validation. likewise we could not find gp100-expressing cells in nonmelanoma lymph nodes. While this does not rule out the possibility that truly DCC-negative sentinel nodes from melanoma patients may contain gp100-expressing benign nevus cells, our finding that even low numbers of gp100-positive cells are prognostically relevant would then suggest that gp100-positive nevus cells may be prognostically informative. Furthermore, in 97% of all analyzed gp100-positive cells we detected chromosomal or subchromosomal alterations. For only one patient out of 46 could we not confirm the malignant descent of the isolated cell. In all other cases genetic alterations in gp100-positive cells proved disseminated melanoma, suggesting that morphological criteria for DCC identification are dispensable. Then, we directly addressed the question of whether staining for another antigen (Melan-A) increases the detection rate and the prognostic power of the gp100-based immunoassay. However, while Melan-A staining added a few samples (7%) to the gp100identified positive lymph nodes, it also stained 8% of control nodes. Moreover, assessing the prognostic power of the combined results of gp100 and Melan-A staining for 710 patients, we found that the gp100-only model was more accurate.
Compared to our assay, evaluation of sentinel nodes by pathology has two major limitations. First, sensitivity largely depends on the number of slides examined. Second, quantification of lymphatic melanoma spread-a three-dimensional and often multilocular process-is impossible by histopathology. We resolved these problems by homogenizing the patchy spatial distribution of tumor cells within the node [32]-which greatly impacts detection in tissue sections but less so in our approachand counting the stained cells. Thus, screening of a median of only two slides (2610 6 cells) per node revealed a detection rate of 51%, whereas pathology was positive in only 14% of patients. To achieve a similar sensitivity by histopathology, it has been suggested that more than 36 slides per sample need to be analyzed, indicating that immunocytology might be advantageous also for practical reasons [7,12]. In the future, both lymph node preparation and screening may even be subjected to partial automation and thereby decrease workload further.
The high detection rate of immunocytology is reminiscent of the sensitivity of RT-PCR methods, which is also around 50% [33]. However, despite 20 y of clinical evaluation, RT-PCR assays have failed to become clinical routine. Since one of our major findings consists in the quantitative impact of lymphatic cancer cell dissemination for patient outcome, we deem the non-quantitative nature of RT-PCR assays and the failure to prove the malignant melanoma origin of the detected nucleic acids to be a likely explanation for its failure. RT-PCR assays do not measure cell numbers but transcript numbers, which may be generated by a few high-expressing cells or many low-expressing cells. Since the unit of selection during malignant progression is a cell and not a transcript, even quantitative transcript information will always represent a qualitative assessment of cancer spread. Therefore,  RT-PCR methods are unable to provide the information delivered here that the prognostic weight of a single disseminated melanoma cell is context dependent.
In summary, we provide evidence that quantification of lymphatic cancer cell dissemination is feasible and can be combined with other quantitative and qualitative characteristics of the primary tumor for accurate individual outcome prediction, probably not only for melanoma but also for other types of solid cancer [34]. It will be important to validate the findings in an independent study before the assay and the prediction model are used clinically.     Editors' Summary Background. Because the skin contains many different cell types, there are many types of skin cancer. The most dangerous type-melanoma-develops when mutations occur in melanocytes, the cells that produce the pigment melanin. Less than 5% of skin cancers are melanomas, but melanoma causes most skin cancer deaths. Early signs of melanoma are a change in the appearance of a mole (a pigmented skin blemish) or the development of a new and unusual pigmented lesion. If these signs are noticed and the melanoma is diagnosed before it has spread from the skin into nearby lymph nodes and other tissues, surgery often provides a cure. For advanced melanomas, the outlook is generally poor, although novel therapies may prolong a patient's life.

Supporting Information
Why Was This Study Done? When a person is diagnosed with melanoma, it is important to ''stage'' the tumor.
Knowing the extent and severity of the melanoma helps oncologists plan treatments and estimate their patients' likely outcomes. The detection of isolated melanoma cells in sentinel lymph nodes (the nodes to which cancer cells are most likely to spread from a primary tumor) is included in melanoma staging recommendations. However, finding rare tumor cells in sentinel lymph node biopsies by examining the tissue requires the analysis of many slides from each node removed from the patient and is extremely timeconsuming. In this study, the researchers investigate the predictive value of a quantitative immunocytological assay that involves disaggregation of the sentinel node and detection of disseminated cancer cells (DCCs) by immunostaining for gp100 (a marker for melanoma cells). They also use this new assay to examine the effect of increasing numbers of DCCs on melanoma-specific survival.
What Did the Researchers Do and Find? The researchers used routine histopathology and immunocytology to analyze 1,834 sentinel lymph nodes from 1,027 patients with melanoma who underwent sentinel lymph node biopsy at one German hospital. For immunocytology, the researchers recorded the number of gp100-positive cells per million lymph node cells (the DCC density). During follow-up, 138 patients (13.4%) died from melanoma. The results indicated that increased DCC density was associated with an increased risk of death due to melanoma. Specifically, every 10-fold increase in DCC density + 1 was associated with a near doubling of the risk of death from melanoma (a hazard ratio of 1.81). Even patients with three or fewer gp100-positive cells per million lymph node cells had an increased risk of dying from melanoma compared to patients with no gp100positive cells (hazard ratio 1.63). When other predictors of outcome such as age and primary tumor location were taken into account, DCC density was a stronger predictor of death than histopathology. Finally, a survival model that included tumor thickness, tumor ulceration, and DCC density provided survival prediction superior to that of a model based on the current standard staging recommendations.
What Do These Findings Mean? These findings show that quantification of cancer cell dissemination from melanomas to sentinel lymph nodes is feasible and can be combined with other characteristics of the primary tumor to provide an accurate prediction of outcomes for individual patients with melanoma. Notably, the new prediction model identifies a group of patients at high risk of progression for whom the current clinical standard underestimates the risk of death. These patients may benefit from adjuvant therapies, so the new analysis presented in this study may help to stratify patients for clinical trials. Importantly, quantitative immunocytology and the new model, although internally validated in this study, need to be validated in an independent group of patients before they can be considered for routine clinical use. If external validation is successful, quantitative immunocytology, which is much less labor-intensive than histopathology, has the potential to change the routine clinical care of patients with melanoma and probably with other solid tumors, conclude the researchers.