Evaluation of a generalized knowledge-based planning performance for VMAT irradiation of breast and locoregional lymph nodes—Internal mammary and/or supraclavicular regions

Purpose To evaluate the performance of eleven Knowledge-Based (KB) models for planning optimization (RapidPlantm (RP), Varian) of Volumetric Modulated Arc Therapy (VMAT) applied to whole breast comprehensive of nodal stations, internal mammary and/or supraclavicular regions. Methods and materials Six RP models have been generated and trained based on 120 VMAT plans data set with different criteria. Two extra-structures were delineated: a PTV for the optimization and a ring structure. Five more models, twins of the previous models, have been created without the need of these structures. Results All models were successfully validated on an independent cohort of 40 patients, 30 from the same institute that provided the training patients and 10 from an additional institute, with the resulting plans being of equal or better quality compared with the clinical plans. The internal validation shows that the models reduce the heart maximum dose of about 2 Gy, the mean dose of about 1 Gy and the V20Gy of 1.5 Gy on average. Model R and L together with model B without optimization structures ensured the best outcomes in the 20% of the values compared to other models. The external validation observed an average improvement of at least 16% for the V5Gy of lungs in RP plans. The mean heart dose and for the V20Gy for lung IPSI were almost halved. The models reduce the maximum dose for the spinal canal of more than 2 Gy on average Conclusions All KB models allow a homogeneous plan quality and some dosimetric gains, as we saw in both internal and external validation. Sub-KB models, developed by splitting right and left breast cases or including only whole breast with locoregional lymph nodes, have shown good performances, comparable but slightly worse than the general model. Finally, models generated without the optimization structures, performed better than the original ones.

although in general the today's plan quality is of high level. The mechanical performance and the dosimetric accuracy of the RP, as well as the improvements in OAR sparing using RP planning were verified, showing that RP could be used in clinical practice [34].
The present study aimed to evaluate the performance of eleven KB models created with RP software for VMAT planning optimization applied to whole-breast irradiation comprehensive of nodal stations, Internal Mammary (IM) and/or SupraClavicular (SC) regions.
The breast is an elective choice as a site for KB planning investigation, but to the best of our knowledge, a generalized model for a breast target comprehensive of the locoregional lymph nodes has never been done for VMAT treatments. 3D Conformal Radiation Therapy (3DCRT) is still the most used technique for breast irradiation, but in some particular cases it is not sufficient to obtain a good target coverage, sparing at the same time the surrounding OARs. Moreover, 3DCRT plans can be very operator dependent. The patient groups who are characterized by particular anatomies or who have to treat the lymph nodes, especially the IM lymph nodes, might be most beneficial from the VMAT technique [35], in the view of dilemma in ensuring IM lymph nodes coverage while limiting central lung depth and maximum heart depth with 3DCRT [36]. In fact, VMAT breast treatments provide good target coverage and organs at risk (OAR) sparing [37][38][39][40][41][42][43][44][45][46] but, even more when the majority of the near nodal stations are included, it is time consuming and it requires the delineation of ad hoc additional structures to improve conformity and decrease the dose to the surrounding structures, in particular lungs and heart. Moreover, it had to be considered the enormous variability between different patients and geometries, which could result in particularly challenging plans for patients presenting complex anatomies.
Previous studies have already investigated KB approach for breast cancer, mainly focusing on whole breast with Simultaneous Integrated Boost (SIB) to the tumour bed [47,48]. Another study conducted by van Duren-Koopman et al. demonstrated clinically competitive and efficient optimization with RP of hybrid VMAT in tangential chest-wall irradiation plus SC nodes [49]. As a step further, there is still a lack of information whether RP can provide promising dose solution for VMAT treatment of breast cancer with IM nodes involvement.
The goal of this study is to evaluate the feasibility of several RP models for whole breast irradiation and locoregional nodes and then to validate the models on internal and external patients' cohort, in order to appraise the robustness and flexibility of RP models. The end point is to have models that successfully and efficiently produce clinically acceptable plans for breast site within the departmental protocol and outside it, to understand which model performs better in every case.

Methods and materials
A set of clinical plans elaborated from January 2017 to September 2019 was included in this retrospective study. Criteria for selection were breast cancer adjuvant radiotherapy; VMAT technique; IMN delineation; no indication to only-nodes volume radiotherapy (inclusion of also breast gland or chest wall). The geometry and dosimetry of the structure set of each plan are then parameterized and extracted in the models. Each model was built on a range of plans from 52 to 120 (depending on the model).
Then the training phase begins and once the training of the model is completed, the model must be evaluated. The software integrated statistics identified the possible outliers in the regression of the principal components, according to Cook's distance, a measure of the influence of individual training set cases on regression coefficients and eventually other statistic parameters like the studentized residual.

Model configuration
Patients who received VMAT treatments to breast sites were retrospectively selected by searching in our institution database.
The breast Clinical Target Volume (CTV) is defined as the entire mammary gland, the CTV_surgical bed, if there is one, is defined as 1 cm around the surgical clips placed in the lumpectomy area, the supraclavicular nodes are defined as the CTV_SC and the internal mammary nodes are defined as the CTV_IM LN, its extension depends on the prescription. The Planning Target Volumes (PTV) are defined by adding an anisotropic margin: 2mm in the medio-lateral direction, 5mm in the antero-posterior direction and 10 mm in the cranio-caudal direction [50]. All the PTVs were cropped 3mm inside the body outline to exclude the skin and, for SIB cases, the sum of other PTVs is subtracted at the isotropic expansion of the CTV_Boost, named PTV_boost. A total dose ranged in 57.5-63.22 Gy in 25/29 fractions was prescribed to the boost volume (PTV_boost), and simultaneously 50-52.2 Gy to the whole breast PTV or whole breast with nodal stations. If a single volume had indication, dose prescription could be 40.05 Gy in 15 fractions or 50 Gy in 25 fractions. In detail, breast treatments included in the models are: All plans were delivered in our department and therefore approved by a radiation oncologist. The models were trained with selected plans to include a wide range of cases representative of our clinical practice. All patient data were anonymized.
A potential critical point in an automated process, is the use of the same (or not) optimization and calculation algorithms for generating the plans used to feed the model during the validation phase, as well as for the implementation in the clinical practice. In the present work, the clinical plans were generated with the PRO optimization engine and the AAA algorithm, while in the RP validation the PO optimization (PO version 13.6.23, Varian Medical Systems, Inc., Palo Alto, CA, USA) and Acuros dose calculation algorithms (version 13.2.63, Varian Medical Systems, Inc., Palo Alto, CA, USA) were used. PO was found to overcome PRO limitations for VMAT planning [51,52] and Acuros is more accurate than AAA [53,54]. Regarding the optimizer, the initial clinical plans might have been better if optimized using the duo PO-Acuros. To exclude this possibility and also to avoid the eventuality to ascribe the improved quality of the RP generated plan to the different optimizer, although the algorithms differences should in principle have no real impact in the use of RP, we recalculated the original plans with PO and Acuros before including them in the models. In this way, all the comparisons were between plans consistently generated by the PO optimizer and computed with Acuros, that was applied as the algorithm for the final dose calculation as well as for the intermediate dose calculation.
Approaches such as Deep Inspiration breath-hold (DBHI) techniques during VMAT irradiation are suggested in literature [55][56][57] in order to avoid significant variations in dosage to the PTV and to reduce the dose delivered to the heart and lungs volumes, in particular for left side breast cancer. However, in our data set only two patients were treated with this technique, mainly because a very good compliance of the patient is necessary to ensure an optimal delivery.
VMAT plans were optimized for 6MV photon beams with two or three partial arcs, collimator angle of 20-30˚/330-340˚and the isocenter opportunely placed in the target. Additional partial arcs were added in some more challenging cases, always within the limits of anterior/ oblique to posterior incidence. All plans were normalized to the mean dose to PTV as for institutional policy in clinical routine and in compliance with the ICRU recommendations. The Acuros-XB dose calculation algorithm was adopted with a dose grid resolution of 2.5mm, as well as the slice thickness used for the CT (GE Optima 660) image on which the dose is calculated is 2.5 mm.
One hundred and twenty VMAT plans (60 left-sided, 60 right-sided breasts), 62 of them with SIB, were selected for the training of the DVH estimation models. These plans were manually performed by expert planners and approved by radiation oncologist based on the standard procedures of our department. All the plans selected for model training were checked for their quality before to their inclusion in the model, in terms of the maximum (D max ) and mean dose (D mean ) of the PTVs and OARs and in terms of the dose-volume parameters of PTVs and OARs as required by our clinical protocol (Quantec). The following parameters have been also calculated: ■. Homogeneity Index HI; defined as D 2% À D 98% D 50% , where D 98% , D 2% , and D 50% are the doses received by 98%, 2%, and 50% of the PTV respectively ■. Homogeneity Index HI 95 ; defined as , where D 95% , D 5% are the doses received by 95% and 5% of the PTV respectively ■. The 95% isodose Conformity Index CI 95 ; defined as V 95% V PTV ; where V 95% is the volume covered by 95% of the prescribed dose and V PTV is the PTV volume ■. The 100% isodose conformity index CI 100 ; defined as V 100% V PTV ;), where V 100% is the volume covered by 100% of the prescribed dose and the V PTV as previously described Six different RP models have been generated and trained based on 120 VMAT plans data set: 1. Model B (120 VMAT plans) includes plans with whole right and left breast irradiation with locoregional lymph nodes. For RP generated plans, only two extra-structures were delineated, already one step ahead of the 8 to 10 structures that are usually created in the reference planes. The first structure was the PTV enlarged with a margin of 1 mm, cropped from the Body by 3 mm and for SIB cases a Boolean difference is performed between the PTV_all and the PTV_boost with a margin of 3 mm. The second was a ring structure defined as expansion of the PTV cropped of 0.3 cm from the PTV edge and 3 cm thick, to ensure a good dose conformation together with the appropriate NTO choice (with a fall off of 0.7).
The other five models, twins of the previous models, except for SIB model, have been created so that the two aforementioned extra structures don't have to be contoured and included during the optimization phase. These models are below named like their own gemini with the wording "No OS", namely "without optimization structures".
The OARs included in the training phase are: ipsilateral and contralateral lungs, their sum structure as lungs, contralateral breast, heart, spinal canal, Left Anterior Descending Coronary Artery (LADCA), esophagus and thyroid. Upper, lower, mean and line optimization objectives and their priorities were created in the model configuration for target and OAR that aim to achieve the standard protocol objectives of the department. For the serial organs, where point maximum dose constraints were the only constraints in the departmental protocol, a fixed upper objective and priority was used. For heart and lungs, where the accepted dose was more influenced by geometric factors, the RP model was used to generate a line objective and priority.
Potentially incorrect optimization line objectives for the estimated structures could be provided because of some outliers not properly checked. Any contours that were highlighted as outliers in the RP statistics were individually assessed and removed from the model if they were assumed to be outliers. No whole plans were directly removed during the training process for the breast models, but if the number of outlier structures exceeded half of the total number of structures, the whole plan was therefore removed.
The choice of the proper objectives and priorities adopted to create a model is an additional important factor related to the model quality. The placement of the line objective below the lower boundary of the prediction DVH improves the average plan quality. The good results of the plans generated with RP could come from the combination of the two objectives included in the model: the generated line-objective and the mean objective, both with generated priorities. Line objectives of a specified OAR refer to the 'most-likely occurring' DVH curve within the estimated DVH range and correspond to the low edge of the DVH range (mean estimated DVH ±one standard deviation).
After accomplishing several fine-tuning tests by planning sample patients, the KB-based template for planning optimization was finally generated and used for automatic optimization.

Model validation
The validation phase consists of using the trained models to estimate DVHs on a group of patients with similar characteristics compared to those used to train the models. A set of 40 plans, not used for training the models, were selected: 30 (12 left, 18 right) from our department (clinic 1) and 10 (all right) from another institute (clinic 2). All clinical plans were approved for use ("Reference" plans in the following) and re-optimized with the above detailed RP models.
Sharing the model between centers has been quite easy, since all the necessary data can be exported in binary encrypted format from one center and simply re-imported into the Eclipse planning system of the destination center. No exchange of any patient data has been necessary for the purpose. The affinity among the centers would imply reasonable similar practice, protocols and some homogeneity in the patients' population. No special conditions were imposed to the testing center to strictly adhere to the model definitions, in terms of contouring rules for example, but rather the aim of the study was to appraise the possibility to use the same model within a real world environment, mimicking routine practice in different institutes.
The DVHs of the clinical plans in the validation patients were compared with the estimated DVHs obtained from each model. During the RP based optimization, no changes of the objectives nor priorities were allowed to exclude any operator dependent bias.
Standard quantitative and qualitative assessment of the DVHs was performed by inspecting the above mentioned dose volume parameters for either the targets, aiming to coverage and homogeneity information, and for the OARs, aiming to meaningful metrics for organs sparing, reported in more detail in the tables that show the obtained results, to appraise the quality of the model-based optimized plans versus the clinically accepted baseline benchmark.

Statistical analysis
Statistical analysis was performed to compare the different dosimetric parameters of RP plans and manual plans. The Shapiro-Wilk test (OriginPRO by OriginLab (version 8.1)) was performed to verify the normality of the data. For normally distributed data, paired t tests were used to compare the different parameters. For non-normally distributed data, the Wilcoxon signed rank test has been performed. The tests assumed a null hypothesis, and the difference was considered statistically significant at p<0.05( �� ), and highly significant for p<0.01( ��� ), but also a tendency to significance is pointed out if p<0.1( � ).

Internal validation
The models were validated on an independent cohort of 30 patients of our department, with the resulting plans being significantly faster and of equal or better quality compared with the clinical plans.
The model training statistics given by the system showed acceptable model fit with, among the other parameters, an average chi-square (Pearson) for the regression model parameters of It is interesting to discover that almost the totality of these outliers turned to good value for the specific OAR, or the value of the corresponding OAR fell in the right range of constraints after the optimization phase ("green value"). In detail, a range [66.7-100]% of the red outliers, depending on the model, turned to be a "green value" after the optimization for all the parameters of the OARs and the 100% of the yellow outliers turned into a "green value". For the SIB model only, 16.7% of the yellow outliers leads after the optimization to some parameters for the specific OAR that do not fall within the optimal dosimetric range. More details are included in S4 and S5 Tables in S1 File.
In general, all KB-based plans were clinically acceptable in terms of PTVs coverage and OAR sparing. The PTV/OARs average dose-volume objectives were used to appraise the quality of the reference and RP dose distributions and were quantitatively analysed for PTV and OAR to investigate the differences. All DVH parameters are listed in detail in Tables 1-4 and S1-S4 Tables in S1 File. Average DVHs of PTVs and OARs for RP plans were compared to the reference plans and are shown in Fig 1 and in S1-S3 Figs.
About the 1% of 1141 analysed dose-volume objectives in the clinical plans failed to reach the optimal constraints, while the respective RP plans succeeded (more details are reported in S6 Table in S1 File). In 6 out of 30 evaluation data set plans, reduced to 2 or 0 cases out of 30 for models without the optimization structures, the RP plan failed at least one of the dose-volume constraints compared to the delivered plan, but with a mean difference of less than 6%. Vice versa, when RP brings the constraints below the optimal value, the average difference is up to 20%. The analysis showed that RP based optimizations lead to modest but systematic improvements in OAR sparing. Quantitative improvements were observed in RP plans, especially for heart and spinal canal doses. As shown in Tables 1 and 3 and S1 Table in S1 File, the models reduce the heart maximum dose of about 2 Gy, mean dose of about 1 Gy and the V 20 of 1.5 Gy on average. Moreover, the maximum dose for the spinal canal is about 1 Gy less on average, although not statistically significant. Small improvements are observed for the V 5Gy and the D mean of lungs in RP plans, although they are significant only for the B model. Some OARs shown slight dose increases, such as the contralateral breast or the LADCA, for LN and IM LN models. The same parameters turn to be slightly but statistically improved in model B, for example the V 10Gy for the contralateral breast or the D max for the LADCA. Almost no statistically significant results were observed for lung IPSI, spinal canal, oesophagus or thyroid.
These improvements are confirmed in the models created without any other optimization structures, corroborating the general idea that RP allows a gain also in planning time and gives planners the possibility not to use additional optimization structures to better conform the dose in the target and to better spare the OARs.
Model B without optimization structures results in a better OAR sparing in the 30% of the values if compared to other models, Model R and L together with model B without optimization structures, considering both OAR sparing and PTV coverage, ensuing the best outcomes in about the 20% of the values compared to other models, as it can be seen in Fig 2.

External validation
The above mentioned results are confirmed by the external validation, performed on 10 plans, all right side breasts, whose 9 were treated in SIB in 25 fractions with a dose prescription of 50/ 57.5 Gy in total (2/2.3 Gy per fraction) and 1 was a single breast volume without boost treated in 16 fraction for a total dose of 42.72 Gy (2.67 Gy per fraction). The treated volumes comprehend the whole right breast without lymph nodes, eventually with a boost volume around the surgical clips simultaneously integrated into the irradiation protocol. The contouring strategies were not changed, but the planning ones were slightly modified since the protocol followed by the medical physicists of the clinic 2 is not the QUANTEC one but the RTOG 1005 [58] and the VMAT constraints showed in Boman et al [59].   Table in S1 File). These percentages are far higher than the same values observed in the internal validation.
It is even more interesting to discover that the totality of these outliers turned to good value for the specific OAR, or the 100% of the red and yellow outliers turned into a "green value", so that the value of the corresponding OAR fell in the right range of constraints after the optimization phase.
In perfect agreement with the internal validation, all KB-based plans were clinically acceptable in terms of PTVs coverage and OAR sparing. The PTV/OARs average dose-volume objectives were used to appraise the quality of the reference and RP dose distributions and were quantitatively analysed for PTV and OAR to investigate the differences. All DVHs parameters are listed in detail in Table 5.
About the 0.3% of 292 analysed dose-volume objectives in the clinical plans failed to reach the optimal constraints, while the respective RP plans succeeded. In 2 out of 10 cases evaluation data set plans, 1 objective for each validated model, the RP plan failed at least one of the dose-volume constraints compared to the delivered plan in terms of PTV coverage, with a mean difference of less than 10%. Vice versa, when RP brings the constraints below the optimal value, the average difference is up to 20%, so happened for the V 5Gy of the lungs of one analysed patient, that passed from the 61% in the reference plan to a 35% value using the B model (39% with the R model).
The external validation showed that RP based optimizations lead to systematic improvements in OAR sparing, with sometimes a poorer, but still clinically acceptable, PTV coverage. This sparing is greater than what observed in the internal validation. An average improvement of at least 16% is observed for the V 5Gy of lungs in RP plans for both B and R models. The mean heart dose was almost halved with both models and alike the V 20Gy for lung IPSI with  Table 5, the models reduce the maximum dose for the spinal canal of more than 2 Gy on average. All these improvements are statistically significant, with p value often of the order of 10 −5 , highlighted with the ( ��� ) in Table 5, although, probably for the relatively small sample of the validation test, most of the data were not distributed normally and therefore, after the Shapiro Wilk test, we proceeded with the application of the Wilcoxon Sign rank test. Pie charts show the difference Δ between reference and RP plans, or for every parameter listed in the above tables how many times the difference Δ is better for a model rather than another one. Pie chart (a) refers to the Δ of Table 1, pie chart (b) is weighted for the Δ of OARs only, pie chart (c) refers to the Δ of Table 2 for R treatments and pie chart (d) refers to the Δ of Table 2

Discussions
The RP engine has been widely studied in recent years, applied on different sites: pelvis [60][61][62], esophagus [63], head and neck [64,65], lung [66], spine [67] and brain [68]. Also breast site has already been investigated [47][48][49], but, to the best of our knowledge, not for VMAT treatments comprehensive of locoregional lymph nodes. These publications showed that the quality of KB plans, on average, outperformed that of the corresponding clinically accepted plans. The improvements observed in all the aforementioned studies were, partly, due to the use of the line optimization objectives defined slightly below the estimated DVH lower bound, i.e. the optimization is driven towards the best estimated DVH. DVHs of OARs can be estimated using RP models, trained using PCA and stepwise regression analysis. The need arises for an evaluation of the model's performances, since the training set can be built in many ways and this has already been done for all the above listed sites. The results showed that plans generated with the assistance of RP exhibited improved dosimetric performance compared to the benchmark clinically accepted plans, however highlighting the need to identify properly outlier plans to better implement KB planning into the clinical practice.
Nevertheless, almost the totality of the outliers flagged at the beginning of the optimization process turned to good values for the specific OAR, or the value of the corresponding OAR fell in the right range of dose constraints after the optimization phase. This fact implies that even if a structure is considered "borderline" from the model statistic for its geometric or statistic aspects, at the end that structure can be properly optimized and it can enter the right range of acceptable values for the treatment plan.
Chang et al. showed that the performance of a RP to achieve dose constraints is still behind that of an experienced planner, and manual touch-up could be necessary, although RP based plans with a single optimization without any modifications could produce clinically acceptable plans [69]. The present results are indeed generated with no user interaction during the optimization run. In a few of the breast case studies, minor refinements are still required to smooth out small hotspots or to boost coverage in small regions. Patients where some compromise is required due to proximity or overlap of OARs with PTVs are likely to require further interactions and clinical decisions to compromise either coverage or OAR constraints, but initial optimization using the RP models gives an excellent starting point. For those RP plans that could not fulfill the plan acceptance criteria, minor manual touch-up was sufficient to make them clinically acceptable, with the same quality as those not requiring manual touch-up. Residual failures might be due to an insufficient predictive power of the model, which could be fixed by using greater training sample. In addition, there might be room to further refine the model with the information we have gathered in this study.
The "breast model" describes models applicable to breast cases, regardless the dose prescription, boost, whether simultaneous integrated or sequential, annexes or not lymph nodes, etc. The training set was determined without special selection criteria, if not that of feeding the models with previously treated plans is strongly desirable since these plans reflect treatment techniques and constraints that are clinically acceptable, complying our protocol. The guideline followed for the study was to include in the training set an adequate representation of the population to be sampled. The patient datasets were intentionally generated with a heterogeneous population in terms of tumor location, size and dose prescriptions (from 40.05 to 63.22 Gy, to a single volume or in SIB). However, as shown in this study, with judiciously chosen optimization objectives and a suitable training set, the challenges due to the trade-off between coverage needs and OARs tolerances can be overcome.
The RP plans are capable to meet stringent OAR constraints while still maintaining a good PTV coverage. The range of patient geometries in the model libraries still may not represent the full diversity of breast cancer cases due to individual differences. Therefore, special caution should be taken when applying the model libraries to those patients whose geometry falls outside the range of the constituent plans in the libraries. Removing geometric outliers will reduce the variation of the anatomy within a model and thus resulting in more models necessary to cover all cases. For this reason Sheng et al [60] require further investigation about the idea of building one model that can predict equally well for all cases, to clear whether it is more advantageous to create a model on individual sites or on a combination of cases from some or all sites. This is what we attempted to do in our preliminary study. Our results reinforce the possibility of building effective broad-scope models and, likely, of suggesting that, to some extent, the use of heterogeneous datasets (in their geometric and dosimetric aspects) might be useful if not necessary.
Correlation between heterogeneity of the input data, the number of training cases needed, and generalization power of the models has been investigated generating different models for the different treatments listed above. We demonstrated that sub-KB models, developed by including only one type of treatment at the time (i.e. LN and IM LN models) have shown good performance, comparable but slightly worse than the general model. This could be due to the fact that these models include half of the plans of the general one, but also could be due to the very good power of generalization inherent RP, since the general model was trained with a cohort of mixed cases with equivalent incidence of all classes. A training set which samples the patient population with an adequate case mix can be used for a general purpose and it works better than the more specific model, as these preliminary findings testified. One special mention regards the splitting of right and left treatment sites, because the R and L models have shown as good performances as model B, being R model even better than B model. In particular, R model could be improved changing priorities, especially in heart objectives, because it has to be considered that, in this preliminary study, the priorities and the objectives are chosen to be the same for every model to better compare all the models.
Another important point regarded the impact of the dimension of the training set on the robustness of the KB model prediction. In fact, the number of patients used for training is a critical issue for the resulting quality of KB models. Cagni et al demonstrated the dependence of the DVH prediction performance on the size of the training patients in the model, which is more accurate with training sets �45 plans [23]. This result was in agreement with our RP models, consisting in at least 52-training plans. However, even with a training set of 114 patients, clinically relevant inaccuracies in predicted DVHs were observed, as well as we saw in our general model (B), trained with 120 plans.
Shubert et al performed a side cross-validation test to validate the usability of the same model irrespective of the beam energy selected for the plans [26] and no differences were observed between the plans optimized for 6 or 15 MV photons and all based on the same model. Huang et al appraised that a RP model configured with flattened high energy beams does not satisfy target dose coverage using un-flattened photons and may increase normal tissue exposure if applied to optimize lower energy beams [70]. In the clinical use of the models it has happened in cases of exceptionally large breasts to prefer the use of the 10 MV photons instead of the 6 MV and RP has been found to be effective also in these cases.
The creation of robust models was desirable by removing influential outliers while keeping plans that provide additional information, to create models that are exploitable to a potentially large number of users and, in this study, also to other institutions. The model validation was performed on two groups of cases and not used for the training; one set of cases from the clinic that train the models and one set from an external clinic, likewise to what investigated in some previous studies [24,26,27,[29][30][31]. Our results confirmed that the models, built with at least 52 patients from clinic 1, resulted adequate to properly optimize plans from clinic 2. The OARs sparing obtained using the models in clinic 2 is even higher than what reached in clinic 1. Among the other things, in clinic 2 the validation plans do not include the lymph nodes irradiation, so the models created to work in more difficult cases, perform even better in simpler cases. The findings here reported confirm the possibility of using the models, generated and tested by clinic 1 in clinic 2, however, the adherence to the same guidelines could be facilitated and made stronger by the use of KB planning methods in a multicentric cooperative initiative [23]. Hence, RP may provide uniform plan quality across many centers. The multicentric validation demonstrated the possibility of sharing models among different institutes, highlighting the importance of an accurate validation for KB models [27].
The development and implementation of heart-sparing in breast RT techniques remains a priority. Breath-hold techniques [71][72][73][74][75] and VMAT [76] improve the heart sparing and decrease the risks of cardiac complication probabilities, in particular when the tumor bed is close to the heart or in the case of IM nodes irradiation. Decreases of heart D mean by more than 10% with KB plans were found in almost the totality of the plans, compared to an accepted average D mean value of 10 Gy. Darby et al. showed a linear increase in the relative rate of major coronary events with the heart D mean , the excess relative risk per Gy is 7.4%/Gy [77]. Based on this, an average reduction of 1.44 Gy using the L model without optimization structures, which is the best obtained reduction for this parameter, could represent an approximately 10.7% decrease of the relative risk.
In the previous conventional optimization techniques, two resource-intensive processes were often required. Primarily, the creation of multiple "dummy" structures to aid the optimization process; for example, structures that considered overlap between OAR and PTV, ring structures to better conform the isodoses and optimization PTVs to help in PTV coverage. We proved that these structures are not necessary with RP through the creation of the two twin sets of models, with and without the optimization structures, demonstrating far better achievements with the latter models. Secondly, it was necessary to use a set of optimization objectives that required iteratively adjustment on a patient-by-patient basis, until a clinically acceptable plan could be achieved. With RP, these iterative steps are removed as the DVH objective generation is automatically tailored to each patient. With a suitable training set and an iteratively adjusted plan optimization parameter template, treatment plans achieved satisfactory DVH objectives even after a single optimization.
Planning time was not part of the study design, but some considerations are reported in the supporting information. In general, the increase in planning time, even if manual touch-up is needed to reach an optimum plan after RP optimization, is negligible compared with the total planning time for the manual plans.
Concerning the main aims of the study, the RP KBP approach was shown to be robust with respect to:

Conclusions
These are the first RP models that consider whole-breast irradiation comprehensive of nodal station with VMAT technique. All KB models used for planning allow a homogeneous plan quality and some dosimetric gains, as we saw in both internal and external validation. The results here presented support the conclusion that KB planning systems can improve the mean plan quality of a single institution and of other institutions that have similar protocols. Sub-KB models, developed by splitting right and left breast cases or including only whole breast with locoregional lymph nodes, have shown good performance but slightly worse than the general model. Finally, models generated without the optimization structures, performed better than the original ones. This KB approach effectively refines plans optimization and could be helpful in clinical practice, which can reduce the dependence of plan quality on planner skills thus increasing the robustness and homogeneity of the radiotherapy process. This can also be regarded as powerful tools for knowledge sharing and early education, in particular in a center where there are planners with different expertise, in order to standardize and improve the quality of the plans. The external validation of the model by another center appraise the power of RP generalization, although this can be considered as one of the many feasibility studies which contribute to the veracity of this statement. Further external validations of the model by other centers would definitely certify the robustness of the proposed RP models' power.