Harmonization of the Volume of Interest Delineation among All Eleven Radiotherapy Centers in the North of France

Background Inter-observer delineation variation has been detailed for many years in almost every tumor location. Inadequate delineation can impair the chance of cure and/or increase toxicity. The aim of our original work was to prospectively improve the homogeneity of delineation among all of the senior radiation oncologists in the Nord-Pas de Calais region, irrespective of the conditions of practice. Methods All 11 centers were involved. The first studied cancer was prostate cancer. Three clinical cases were studied: a low-risk prostate cancer case (case 1), a high-risk prostate cancer case (pelvic nodes, case 2) and a case of post-operative biochemical elevated PSA (case 3). All of the involved physicians delineated characteristically the clinical target volume (CTV) and organs at risk. The volumes were compared using validated indexes: the volume ratio (VR), common and additional volumes (CV and AV), volume overlap (VO) and Dice similarity coefficient (DSC). A second delineation of the same three cases was performed after discussion of the slice results and the choice of shared guidelines to evaluate homogenization. A comparative analysis of the indexes before and after discussion was conducted using the Wilcoxon test for paired samples. A p-value less than 0.05 was considered to indicate statistical significance. Results The indexes were not improved in case 1, for which the inter-observer agreement was considered good after the first comparison (DSC = 0.83±0.06). In case 2, the second comparison showed homogenization of the CTV delineation with a significant improvement in CV (81.4±11.7 vs. 88.6±10.26, respectively, p = 0.048), VO (0.41±0.09 vs. 0.47±0.07, respectively; p = 0.009) and DSC (0.58±0.09 vs. 0.63±0.07, respectively; p = 0.0098). In case 3, VR and AV were significantly improved: VR: 1.71(±0.6) vs. 1.34(±0.46), respectively, p = 0.0034; AV: 46.58(±14.50) vs. 38.08(±15.10), respectively, p = 0.0024. DSC was not improved, but it was already superior to 0.6 in the first comparison. Conclusion Our prospective work showed that a collaborative discussion about clinical cases and the choice of shared guidelines within an established framework improved the homogeneity of CTV delineation among the senior radiation oncologists in our region.


Introduction
The Nord-Pas de Calais region is the fourth most populated region of the 22 French metropolitan regions, with 4.052 million inhabitants. This region includes the North and Pas-de-Calais departments and represents 6.2% of the French population. It is one of the most densely populated regions, with 326 inhabitants/km 2 compared with 115 inhabitants/km 2 in metropolitan France. In Lille and the surrounding areas, where the seat of the Regional Council of Nord-Pas-de-Calais is located, a comparison with national data shows an increased incidence of head and neck, esophageal, lung, liver, bladder, kidney, colorectal, uterine and ovarian cancers [1]. This region comprises 11 centers of radiotherapy.
Radiotherapy plays a key-role in the treatment of cancer. The highly conformal dose distributions produced using modern techniques require careful delineation of target volumes and organs at risk (OARs). An inadequate radiotherapy plan can diminish the chance of a cure and/or increase the risk of toxicity. The quality of radiotherapy plans affects the outcome of chemo radiotherapy in head and neck cancer [2]. In a meta-analysis of eight studies (4 pediatric and 4 adult patients) the frequency of quality assurance deviations ranged from 8% to 71% and radiotherapy deviations were associated with a statistically significant decrease in overall survival (HR of death = 1.74, 95% confidence interval [CI] = 1.28 to 2.35; p < .001) [3].
Prostate cancer is the second most common cancer in men and remains the most common cancer in developed countries [4]. Inter-observer variability in the definition of target volumes has been well established since the beginning of conformal and intensity-modulated radiotherapy for prostate cancer [5][6][7]. The aim of our work was to improve the delineation homogeneity among the radiation oncologists in the Nord-Pas de Calais region through collaborative discussions concerning clinical cases and the selection of shared guidelines.

Materials and Methods
All 11 centers were involved: eight private, two with mixed public-private activity and one academic department of radiation oncology. The first studied cancer was prostate cancer. Three fictitious clinical cases were sent to all of the centers. Each case included a detailed description of the clinical history, histologic or anatomopathologic data and computed tomodensitometry (CT) scan of anonymized images. Low-and high-risk (pelvic nodes) prostate cancer according to the D'Amico classification and post-operative biochemical elevated PSA cases were studied. In case 1, anonymized magnetic resonance (MR) images for image fusion were also sent. A detailed description of the three cases and the volumes to be delineated is presented in Table 1.
These data were sent with the P2E (AQUILAB SAS) workstation that equips each center. All of the involved physicians delineated characteristically the clinical target volume (CTV) and OARs. After delineation, each center sent the data to Onco-npdc, where contours were compared (C. Viot); the delineation was also anonymized. The following validated indexes were used for delineation comparison: the volume ratio (VR), common and additional volumes (CV and AV), volume overlap (VO) and Dice similarity coefficient (DSC) ( Table 2) [8][9][10][11][12]. The contours of a participant were randomly selected as the "reference" (method 1). The same participant was selected for all three cases during the two comparisons. Indeed, the aim of the present study was to increase the homogeneity of delineation, and we hypothesized that the choice of the "reference" contours did not significantly influence the results. We compared each contour with a common contour also comprising the delineation of 9/14 physicians (method 2). This method facilitates the evaluation of delineation harmonization and avoids the selection of a random or reference contour [13]. The results were discussed slice by slice by senior and junior radiation oncologists during three meetings a year, and shared guidelines were selected for each clinical case. A second delineation of the same three cases was then performed to quantify the standardization. The first delineation was conducted during the month prior to the meeting, and the second delineation was achieved in the month following the meeting. The same methodology and indexes of the first comparison were used. Comparison of the OAR delineation was not realized. Statistical analyses were performed using JMP1 (Version VR: volume of the reference contour; Vn: volume of the contour to be compared; CR: reference contour; Cn: contour to be compared. 10; SAS Institute Inc., SAS Campus Drive, Cary, North Carolina). A comparative analysis of the index before and after discussion was performed using the Wilcoxon test for paired samples. The VO and DSC of the three cases were also compared (Mann-Whitney test for unpaired samples). A p-value less than 0.05 was considered to indicate statistical significance.

Ethics
All of the participating physicians were volunteers. Each one signed a document in which he or she agreed to collaborate on the work (S1 File). This study was financed by several institutions and participating centers (please see the Acknowledgments section) and was administered by the Regional Cancer Network Onco-npdc. According to French laws, this work did not require advice of an ethics committee. Agreement N1034071 was obtained from the "National Commission for Data-collection and Freedom" (''Commission Nationale Informatique et Liberte´") for the conduct of this work. Anonymized CT and MR images were used for the development of fictitious but realistic clinical cases. D.P. was responsible for anonymizing the data. No participant had access to the patient data prior to anonymization. D.P. was responsible for initially collecting these data. C.V. was responsible for collecting the anonymous results of delineation. None of the authors or participants were involved in the patient's medical treatment.

Results
Fourteen physicians involved in the treatment of urologic cancers at the 11 centers participated. In case 1 (low-risk prostate cancer), the first comparison using method 1 showed acceptable agreement with a DSC value of 0.83 (±0.06). Despite the use of MR images, some differences were observed in the apex and base delineations (Fig 1A, 1B and 1C). The chosen guideline was that by the European Organization for Research and Treatment for Cancer (EORTC) [14]. The indexes were not improved during the second comparison but were considered as correct, with a DSC of 0.83 (±0.08) ( Table 3). Concerning case 2, the differences in the CTV delineation were mainly located at the inferior and medial borders of the obturator area, the inferior border of pre-sacral and external iliac areas and the superior border of the primitive iliac area (Fig 2). The chosen guidelines were those of the Radiation Therapy Oncology Group (RTOG) [15]. Using method 1 the second comparison showed homogenization of the CTV delineation with a significant improvement in VO (0.41±0.09 vs. 0.47±0.07, p = 0.009) and DSC (0.58±0.09 vs. 0.63±0.07, p = 0.0098) ( Table 3). The AV was also improved from 41.07 (±10.98) to 33.86 (±9.42), approaching borderline significance (p = 0.07). Concerning case 3, the differences in the CTV delineation were located at the superior and inferior boundaries and at the anterior and superior border of the volume where CTV moves away from the posterior edge of the pubic symphysis (Fig 3). The chosen guidelines were those from the Radiation Therapy Oncology Group (RTOG) [16]. During the second delineation, VR and AV were significantly improved: 1.71 (±0.6) vs. 1.34 (±0.46), p = 0.0034 and 46.58 (±14.50) vs. 38.08 (±15.10), p = 0.0024, respectively using method 1. The CV was probably significantly decreased relative to the large decrease in the volume ratio. DSC was not improved, but it was already superior to 0.6 in the first comparison ( Table 3). Analysis of the images  Table 3. Comparison indexes for the three cases (method 1).

VR (±SD) CV (±SD) AV (±SD) VO (±SD) DSC (±SD)
Case showed a standardization of the delineation of the anterior and superior borders of the CTV (Fig 3).  The VO and DSC were compared between cases 1, 2 and 3 for comparisons 1 and 2, using method 1. These indexes were significantly better in case 1 than in cases 2 and 3 (p<0.05) in comparisons 1 and 2. No significant difference was observed between cases 2 and 3.

Discussion
The aim of our original work was to prospectively improve the homogeneity of the delineation among all of the senior radiation oncologists in the North of France, regardless of the conditions of practice. To the best of our knowledge, this is the only work of its kind in Europe. In this article, we did not seek to further describe accurately the inter-observer variations, which have already been thoroughly done in the literature, but rather to highlight the qualities of this collaborative work across the Nord-Pas-de-Calais region.
The goal of the present study was to evaluate the homogenization of delineation among physicians. There is no standard method in literature for this work; thus, a reference is necessary to calculate the indexes. We hypothesized that the random selection of the same physician would not significantly influence the results. Concerning the "reference" contours from one physician, the differences were slight between the first and second delineation (data not shown). As a limitation of the present study, we could not assert whether this hypothesis was completely right. To overcome this limitation, we compared each delineation with a common contour comprising the delineations of most of the physicians. This method facilitated the evaluation of the harmonization of delineation and avoided the selection of a random or reference contour. The results were similar whatever the method used, with the improvement of some indexes for cases 2 and 3.
It is important to note that the volumetric indexes used in our study to compare the CTV delineation are more sensitive than metric ones. For example, the volume overlap (VO) of two volumes overlapping at 85% is 0.74. The VO of two cubes composed of 10×10×10 voxels after the shifting of one voxel along the diagonal of the cube is 0.57 (729/1271), whereas the mean distance between the two cubes is around one voxel only [11]. There is no standard value beyond the inter-observer variation that is considered low. It is commonly accepted that a value greater than 0.6 is correct; a value greater than 0.8 is considered good and close to the intra-observer variability. In the present study, the DSC values were superior to 0.6 in cases 1 and 3 in the first comparison and after the second comparison in case 2 using method 1. The indexes were not improved in case 1, for which the inter-observer agreement was considered good after the first comparison whatever the method used. Some indexes were improved during the second comparison (method 1: VO and DSC in case 2, VR and AV in case 3; method 2: CV, VO and DSC in case 2, VO and AV in case 3).
The inter-observer delineation variation was significantly larger in cases 2 and 3 than in case 1 for the two comparisons. Indeed, the complexity of these cases was more important, with a delineation based on the pelvic vascular anatomy for case 2 and the lack of macroscopic target for case 3.
Inter-observer delineation variation and its influence on dosimetry have been shown for many years in almost every tumor location [5][6][7][17][18][19][20][21]. Multimodality fusion can improve homogeneity [22][23][24]. Some studies have shown an improvement in the delineation homogeneity between radiation oncology residents after educational intervention [25,26]. Short-term improvement in head and neck delineation was shown in 11 residents after a teaching intervention; in this study, the evaluation was subjective as contours were scored in a blinded fashion by the investigators [26]. Wide heterogeneity can be observed among the senior radiation oncologists. In the study by Lawton et al., significant disagreement existed in the definition of the CTV for pelvic nodal radiation therapy among genito-urinary radiation oncology experts [7], leading to the development of a consensus [15]. Nevertheless, in some situations, guidelines may vary. Malone et al. compared four consensus guidelines concerning the CTV delineation for post-operative radiotherapy after prostatectomy in 20 patients. The mean volumes (±SD) were 60 (±17) cc and 102 (±24) cc for the smaller and larger ones, respectively, bringing about large differences in the doses delivered to OARs [27].
From this statement, scientific societies have implemented delineation courses worldwide; closer to our region, we can mention the online European and French tools as well as the training delivered during their annual conferences [28][29][30][31]. The originality of our additional work lies in the prospective exchange and collaboration of all physicians across our region in a formal setting. This work is ongoing with head and neck and breast delineation and a comparison of prostate cancer intensity-modulated radiotherapy optimization based on common volumes. We wish to extend our work to our neighboring region, Picardy, with which a merger is planned.

Conclusion
This prospective study showed that a collaborative discussion concerning clinical cases and the selection of shared guidelines within an established framework improved the homogeneity of the CTV delineation among the senior radiation oncologists in the Nord-Pas-de-Calais region.