A radiomic approach for adaptive radiotherapy in non-small cell lung cancer patients

The primary goal of precision medicine is to minimize side effects and optimize efficacy of treatments. Recent advances in medical imaging technology allow the use of more advanced image analysis methods beyond simple measurements of tumor size or radiotracer uptake metrics. The extraction of quantitative features from medical images to characterize tumor pathology or heterogeneity is an interesting process to investigate, in order to provide information that may be useful to guide the therapies and predict survival. This paper discusses the rationale supporting the concept of radiomics and the feasibility of its application to Non-Small Cell Lung Cancer in the field of radiation oncology research. We studied 91 stage III patients treated with concurrent chemoradiation and adaptive approach in case of tumor reduction during treatment. We considered 12 statistics features and 230 textural features extracted from the CT images. In our study, we used an ensemble learning method to classify patients’ data into either the adaptive or non-adaptive group during chemoradiation on the basis of the starting CT simulation. Our data supports the hypothesis that a specific signature can be identified (AUC 0.82). In our experience, a radiomic signature mixing semantic and image-based features has shown promising results for personalized adaptive radiotherapy in non-small cell lung cancer.


Introduction
According to the National Institute of Health (NIH) definition, precision medicine refers to new prevention and treatment strategies that take individual variability into account; it is a method based on understanding of individual genes, environment and life-style [1].
Precision medicine has been introduced into routine clinical care to minimize iatrogenic damage and reach an optimal therapeutic effect [2]. The possibility to achieve this result is strictly related to modern technologies such as genomics, proteomics and radiomics because they identify the "biomarkers", characteristics that are objectively measured and evaluated as indicators of normal biological processes, pathogenic processes or pharmacologic responses to a therapeutic intervention. In the last years, much of the discussion regarding personalized medicine has focused on molecular characterization using genomic and proteomic technologies. As these need to acquire tissue samples through invasive approaches, and often these samples are only a small portion of heterogeneous lesions, they may not accurately represent the lesion's anatomic, functional and physiologic properties. This limits the use of biopsy based molecular assays, but in contrast it provides a huge potential for non-invasive imaging techniques which take into account the entire volume of disease [3]. Recent advances in medical imaging technology allow the use of more advanced image analysis methods beyond simple measurements of tumor size or radiotracer uptake metrics. Radiomics is the extraction of quantitative features (Quantitative Biomarkers) from medical images to characterize tumor pathology or heterogeneity (phenotype). It is an emerging field of quantitative imaging that aims to extract quantitative data from medical images to characterize tumor pathology or heterogeneity using a large set of advanced imaging features [4]. The goal of radiomics is to provide information that can be used to predict survival, as a prognostic marker, but more interestingly to guide treatment thanks to its predictive value. The possibility of predicting response to a treatment would allow for re-adaptation or intensification of therapy for the patient, in order to offer him greater chances of better outcome, at the aim to change his prognosis.
Radiomics has several implications in lung cancer. There is incontrovertible evidence for intra-tumoral heterogeneity on lung CT image for lung cancer patients and these heterogeneities can be captured with radiomic features. The first radiomic application explored by some papers refers to the diagnostic issue, such as the reduction of not-otherwise-specified tumor (NOS) in unclassified tumors of non-small-cell lung cancer [5] and the possibility to differentiate lepidic predominant adenocarcinoma [6]. Moreover, it was reported that somatic mutations drive distinct imaging phenotypes in lung cancer and a radiomic signatures was able to successfully discriminate between EGFR+ and EGFR-cases [7]. These artificial intelligence methods could be proposed to assist pathologists and clinicians in cases of unresectable tumors or scant biopsy materials for histological subtyping and cancer therapy.
The second point for evaluating radiomics is in the prediction of outcome. Clinical decisions for the treatment of lung cancer are largely based on patient characteristics such as performance status, stage at diagnosis and tumor histology. In metastatic non small cell lung cancer (NSCLC) patients, molecular information has brought remarkable results thanks to targeted therapies. On the other hand, in locally-advanced disease, standard treatment is concurrent chemoradiation which is not guided by molecular data in clinical practice.
Several papers have shown that the combination of clinical, genomic, and radiomic features, provides a prognostic signature for overall survival [8] or for prediction of distant metastasis in lung adenocarcinoma treated with chemoradiation [9] and with stereotactic radiation treatment (SBRT) [10][11]. Radiation therapy is by definition a personalized medicine because the anatomy is proper of each patient and dose distribution is tailored on the target volume and organs at risk. We do not know at the treatment start if that particular patient will achieve a response or not. We know from literature data that about 30-40% of patients who perform chemoradiation undergo a significant reduction of the tumor during treatment [12][13][14][15][16][17]. In this study, we investigated the feasibility of a system where the radiomic features of the patient's initial imaging were able to predict tumor reduction during chemoradiation.

Patient and CT imaging
We studied 91 stage III patients treated with concurrent chemoradiation. As reported in a previous prospective study of our group [18], 50 patients with stage IIIA/IIIB NSCLC were enrolled from November 2012 to July 2014 and treated with concurrent chemoradiation at radical dose with adaptive approach.
It was defined as a reduction in tumor volume (assessed by two radiation oncologists on weekly chest CT simulations) leading to the implementation of a new treatment plan with which the patient would continue radiation therapy. Other 41 patients with the same initial characteristics (PS, stage, age, etc.) who underwent radical concurrent chemoradiation in the same period, but who did not achieve target reduction, were added to the initial group in order to investigate the predictive power of the radiomic features on tumor shrinkage ( Table 1). The characteristics investigated were extracted from the initial simulation CT on which the Clinical Volume was manually delineated by expert radiation oncologists, providing a 3D ROI (Figs 1A, 1B, 2A and 2B). The adaptive protocol was approved by Ethical Committee Campus Bio-Medico University on 30 October 2012 and registered at ClinicalTrials.gov on 12 July 2018 with Identifier NCT03583723 after an initial exploratory phase.
The Institutional review board approved this review. A written informed consent was obtained in all patients.
The authors confirm that all ongoing and related trials for this intervention are registered.

Semantic features
Two experienced radiation oncologists (RO) independently reviewed all CT images and assigned scores to each tumor for nine semantic imaging features, divided into personal data (age, sex and smoking attitude), staging scores of the tumour (T, N and tumor stage), and histology and gene mutations evaluation. All RO blindly assigned staging scores, in case of disagreement, they reviewed the CT images together and any discrepancy was resolved through discussion until consensus was reached.

Radiomic feature extraction
Given each 3D ROI in the images, we computed the following radiomic features using our inhouse software tool coded in MATLAB (Mathworks Inc, MA, U.S.A.), taking into consideration 12 statistics features and 230 textural features extracted from the CT images. Statistical features consist of the moments up to the fourth-order of the first-order image histogram, i.e., the mean, the standard deviation, the skewness and the kurtosis. Furthermore, the picture of grey-level distribution is also grasped by the histogram width, the energy, the entropy, the value of the histogram absolute maximum and the corresponding grey-level value, the energy around such maximum, the number of relative maxima in the histogram and their energy [19]. Texture feature are derived from the 3D gray-level co-occurrence matrix (GLCM) and from the Local Binary Patterns-TOP (LBP-TOP) [20]. The former represents the distribution of co-occurring values between neighbouring pixels according to different displacements, and its statistics correlate well with the image structure. TOP-LBP are descriptors which assign to each pixel of the image a label comparing it with its neighbourhood matrix computed from three orthogonal planes. Histograms of LBP distributions in such planes are then concatenated.

Radiomic feature selection
For each ROI, both semantic and radiomic features were grouped in a single array, which contains 251 features. In machine learning, as in this case, it is common to have a feature vector composed by so many elements: the rationale is that practitioners and researchers should define measures that go beyond the human interpretation of the images, as several discriminative features could be not directly mined by visual analysis. The set of features is then explored by the wrapper method to identify the most discriminative features [21]. A wrapper is a feature selection method that embeds the model hypothesis search within the feature subset search. Indeed, after defining a search procedure for all the possible feature subsets, the various subsets are generated and evaluated by training and testing a specific classification model. We selected this approach, as it is able to discover both feature dependencies, as well as to exploit the interaction between feature subset search and model selection.
In practice, we applied a leave-one-out (LOO) cross validation approach to evaluate the subsets: in this procedure at each iteration one single instance is removed from the data, creating a fold, then the model is trained on all the remaining instances and the removed instance is used for independent validation. The procedure is then iterated among all the instances of the data, processing a same number of folds.
In particular in the variable selection process, we used an LOO loop where at each iteration a wrapper method selected a subset of descriptors. To this aim, such a wrapper is based on a Random Forest and it used an inner 2-fold cross validation loop for performance evaluation. Such a procedure, therefore, returned the frequency of each feature among all the iterations of the external LOO loop. The final set of descriptors contained all the features that were selected at least in the 10% of the LOO iterations. This choice was motivated by the guidelines reported in [22] and by our prior knowledge on the problem domain.
After this selection procedure, five semantic features (sex, N, histology, EGFR mutation, and smoking attitude), two GLCM ("absolute_-1,1,-1" representing the textural variability in the direction given by the (-1, 1, -1) versor and "inertia_0-10" representing the textural homogeneity in the direction given by the (0, -1, 0)), four LBP-TOP measures ("range_LBP3_ri" showing the range of possible patterns in the image without considering their rotation, "skew-ness_LBP3_u" measuring the asymmetry of distribution of all patterns discharging noisy patterns, "mean_LBP3_u" measuring the mean pattern discharging noisy patterns ad "range_LBP3" showing the range of possible patterns in the image) and one Statistical measure ("numMaxRel" counting the number of local maximum in the histogram of the image) were included in the analysis (Fig 3).

Classifier
In our experiments, we used a Random Forest (RF) to classify patients' data into either the adaptive or non-adaptive group. RF is an ensemble learning method for classification that builds a multitude of decision trees at training time and provides the class that is the model of the classes of the individual trees. The number of features is denoted as p. The decision trees are built on bootstrapped training samples and, each time a split in a tree is considered, a random subset of m features is chosen, with m < p. All the experiments were conducted according to a leave-one-out cross validation, which provides a nearly unbiased estimate using only the original data, and a .632+ bootstrap validation, that on the contrary provides a measure with low variance [23].

Fig 4 displays the Receiver
Operating Characteristic curve (ROC) of the proposed system, whereas Table 2 shows the results, reporting the following performance measures: the area under the receiver operating characteristic curve (AUC), the classification accuracy, the precision, the sensitivity and the Positive and Negative predictive values. Each of these metrics was computed collecting all the predictions at the end of the leave-one-out cross validation. Moreover we also reported the 95% Confidence Intervals (CIs) computed as reported in [24][25][26]27]. While the second row of the table shows the performance achieved by the radiomic approach described hereinbefore, the following rows show what happens if some types of features are removed from the original set. In this case we repeated the same procedure explained in the previous sections for feature selection and classification; for the sake of comparison, we kept the threshold for the best subset selection equal to 10%, as before.
In particular, the third row of Table 2 reports the performance when semantic features are neglected, the fourth row shows the scores when GLCM features are discarded and the fifth row shows the scores when LBP-TOP are not used. A particular interest can be addressed to the last row where we reported the scores considering only the semantic descriptors used in clinical practice: as it is possible to notice the semantic features alone get performance which are considerably lower than the corresponding ones achieved by the radiomic signature. Indeed, using only semantic features the AUC of the system is about 4% lower than the one with the final radiomic signature, whereas the accuracy measure is about 6% lower than those of the proposed system.
For the sake of comparison, in Table 2 we also reported the Positive and Negative Predictive Values (PPV and NPV) [28] for all the experiments. It is worth noting that the PPV is considerably lower than the metrics reported in Table 2, however the Proposed system still outperforms the other experiments.
Moreover, first row of Table 3 shows the error rate computed with the bootstrap .632+, which is a method for error estimation with a lower variance than the leave-one-out cross validation [23]. The second row reports the error rate of the LOO cross validation which can be easily calculated from the accuracy measures in Table 2. Comparing the two experiments, it is clear that results with .632+ bootstrap are more biased than those computed with LOO cross validation. This could be expected since bootstrap error is usually more biased than the cross validation one, despite its lower variance [23]. It is also worth noting that the accuracies computed from bootstrap errors coherently fall in the CIs as presented in the Table 3.
Finally, it is worth noticing that in all the experiments presented in Tables 2 and 3 the proposed system shows performance higher than the other experiments presented, confirming the potential and feasibility of a radiomics-based approach.

Discussion
To the best of our knowledge, this is the first trial for feasibility and hypothesis generation of a radiomic strategy to predict tumor shrinkage during chemoradiation and our data suggests that a specific signature can be identified (AUC 0.82). Medical imaging can provide a lot of information beyond volumetric measurements and this process is referred to as image-based phenotyping. With the term phenotype refers to the set of all the characteristics manifested by a living organism, comprising its morphology, its development and its biochemical and physiological properties, including behaviour. This means that behaviour could be predicted starting from how the thing appears to us, from its image, from the phenotype. Somehow this idea seems to make our minds echo the idea of the Greek "kalòs kai agathòs", literally "beautiful and good", which are the characteristics of beauty according to the archaic Greek conception. With regard to lung cancer we should speak of ugly and bad, but also in this case, the radiomics signature computed from rountinary imaging has been unravelling tumor heterogeneity. As reported in the Introduction, this would be useful in many fields, from diagnosis [5][6][7] to prediction of outcome as a prognostic factor [8][9][10][11]. However, the predictive power of radiomics could be very useful for daily practice decisionmaking process. An example of the predictive power for radiomics application is the possibility to announce pathological response. It is a direct measure of tumor response to neoadjuvant chemoradiation assessed at time of surgery. It has the potential to be used as a surrogate endpoint for survival/local control and has been shown to be prognostic for survival in early and advanced stages for NSCLC patients. It is well known that clinical response very often is not related to pathological response after neoadjuvant therapies, due to the low value of traditional reevaluating imaging (CT and PET-TC) in this setting. A recent study investigated if pre-treatment radiomics data is able to predict pathological response after neoadjuvant chemoradiation in patients with locally advanced NSCLC [29]. One hundred and twenty-seven NSCLC patients were included in this study. Fifteen radiomic features were selected and evaluated for their power to predict pathological response. No conventional imaging features were predictive. Seven features were predictive for pathologic gross residual disease (AUC>0.6, p-value<0.05), and one for pathologic complete response (AUC = 0.63, p-value = 0.01). Tumors that did not respond well to neoadjuvant chemoradiation were more likely to present a rounder shape (AUC = 0.63, p-value = 0.009) and heterogeneous texture (AUC = 0.61, pvalue = 0.03). The proven ability of radiomics to predict pathologic response on pre-treatment imaging may allow adaptation to a different therapy, if required, for those patients who may not have a complete pathological response to the initial therapy.
Results of the present study should be interpreted in the same way. Prediction of response during treatment is probably the most stimulating challenge because it would allow modifying therapy in progress. We know from literature data that about 30-40% of patients who perform chemoradiation undergo a significant reduction of the tumor during treatment [12][13][14][15][16][17]. Being able to predict this data rather than the prediction of the classical response that is obtained about a month after the end of the therapy would allow a change in therapeutic strategy, for example by intensifying the treatment itself, which would obviously not be possible once the therapy was delivered as planned at the initial time. Moreover, it would be a great advantage to know before starting chemoradiation if that particular patient is going to meet or not a tumor reduction that requires the execution of a new treatment plan, in order to optimize the workflow. Recently, also prediction using radiomics analyses of cone-beam CT images has been reported [30]. It could, therefore, be possible to modulate treatment strategy thereby offering the patient the chance to change a poor prognosis. In our experience, a radiomics signature mixing semantic and image-based features is able to predict with good performance whether a particular patient will meet or not the reduction of the target volume during chemoradiation. In future, the availability of this data before treatment could allow the specialist to intensify treatment for instance by modifying total dose, fractionation or drugs in combination with radiotherapy or even selecting consolidation therapy such as immunotherapy [31]. In the PACIFIC trial, progression-free survival was significantly longer with durvalumab than with placebo after radical chemoradiation. However, until now, no biomarkers have been identified to select patients who could benefit from this treatment that is not free from side effects. If radiomic signature is validated in future studies as a biomarker to predict response and outcome, patients at high risk for recurrence could be identified early on and become candidates for consolidation therapy. The identification of the external validation dataset is actually ongoing, even if some literature data support the use of the cross validation method [29] as applied in our work. In conclusion, the idea behind this study and the initial results obtained are certainly an original and innovative topic that opens up new research in the field of personalized medicine.
Supporting information S1 File. Complete dataset. File containing the entire dataset of the extracted features and the labels for each patient; the ".arff" file format is the input file format for the Machine Learning software "Weka" used in the experimental process.