High-order radiomics features based on T2 FLAIR MRI predict multiple glioma immunohistochemical features: A more precise and personalized gliomas management

Objective To investigate the performance of high-order radiomics features and models based on T2-weighted fluid-attenuated inversion recovery (T2 FLAIR) in predicting the immunohistochemical biomarkers of glioma, in order to execute a non-invasive, more precise and personalized glioma disease management. Methods 51 pathologically confirmed gliomas patients committed in our hospital from March 2015 to June 2018 were retrospective analysis, and Ki-67, vimentin, S-100 and CD34 immunohistochemical data were collected. The volumes of interest (VOIs) were manually sketched and the radiomics features were extracted. Feature reduction was performed by ANOVA+ Mann-Whiney, spearman correlation analysis, least absolute shrinkage and selection operator (LASSO) and Gradient descent algorithm (GBDT). SMOTE technique was used to solve the data bias between two groups. Comprehensive binary logistic regression models were established. Area under the ROC curves (AUC), sensitivity, specificity and accuracy were used to evaluate the predict performance of models. Models reliability were decided according to the standard net benefit of the decision curves. Results Four clusters of significant features were screened out and four predicting models were constructed. AUC of Ki-67, S-100, vimentin and CD34 models were 0.713, 0.923, 0.854 and 0.745, respectively. The sensitivities were 0.692, 0.893, 0.875 and 0.556, respectively. The specificities were: 0.667, 0.905, 0.722, and 0.875, with accuracy of 0.660, 0.898, 0.738, and 0.667, respectively. According to the decision curves, the Ki-67, S-100 and vimentin models had reference values. Conclusion The radiomics features based on T2 FLAIR can potentially predict the Ki-67, S-100, vimentin and CD34 expression. Radiomics model were expected to be a computer-intelligent, non-invasive, accurate and personalized management method for gliomas.


Histogram Parameters
Histogram parameters are concerned with properties of individual pixels. They describe the distribution of voxel intensities within the CT image through commonly used and basic metrics. Let denote the three dimensional image matrix with voxels and the first order histogram divided by discrete intensity levels. The following first order statistics were extracted: 1.1 Energy: The energy feature measures the uniformity of the intensity level distribution. If the value is high, then the distribution is to a small number of intensity levels. Energy can be defined as: The entropy measures the randomness of the distribution of the coefficients values over the intensity levels. If the value of entropy is high, then the distribution is among more intensity levels in the image. This measurement is the inverse of energy. A simple image has low entropy while a complex image has high entropy. Entropy can be defined as:

MaxIntensity:
The maximum intensity value of .

MinIntensity:
The minimum intensity value of .

MeanValue:
The mean measures the average value of the intensity values. The mean of the absolute deviations of all voxel intensities around the mean intensity value.

MedianIntensity:
The median intensity value of .

Range:
The range of intensity values of .

1.10
Standard deviation: stdDeviation Is a measure that is used to quantify the amount of variation or dispersion of a set of data values.
where ̅ is the mean of .

1.12
Variance: Is the average of the squared differences from the Mean.
where ̅ is the mean of .

Volume Count
Describe the size of the ROI.

Voxel Value Sum
Represents the Sum calculations for voxels in the ROI.

1.15
RelativeDeviation Let ̅ denote the mean of a set of quantities , then the relative deviation is defined by: Frequency Size

1.17
Quantiles Quantile normalization is a global adjustment method that assumes the statistical distribution of each sample is the same. The normalization is achieved by forcing the observed distributions to be the same and the average distribution, obtained by taking the average of each quantile across samples. They are cut points dividing the range of a probability distribution into contiguous intervals with equal probabilities, or dividing the observations in a sample in the same way. For a finite population of N equally probable values indexed 1, …, N from lowest to highest, the k-th q-quantile of this population can equivalently be computed via the value of:

1.18
Percentiles A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. The percentile, p%, of a distribution is defined as that value of the brightness a such that: P(a) = p%. or equivalently: ∫ P(α) = p% a −∞ The P-th percentile 0 < P ≤ 100 of a list of N ordered values (sorted from least to greatest) is the smallest value in the list such that P percent of the data is less than or equal to that value. This is obtained by first calculating the ordinal rank and then taking the value from the ordered list that corresponds to that rank. The ordinal rank n is calculated using this formula

Skewness
Represents the degree of asymmetric distribution in the image histogram, this means that in some distribution of data, the right and the left of the distribution are perfect mirror images of one another, the mean, median and mode are all measures of the center of a set of data. The Skewness of the data can be determined by how these quantities are related to one another.
[6] High values of Skewness means that the distribution is asymmetric otherwise the image is more symmetric; negative skew is when the numerical distribution is relatively long also called negative Skewness distribution, the opposite is referred as positive Skewness distribution (positive skew). Its possible to use The positive and negative Skewness to draw comparisons between the uniform distribution curve. Formula: where ̅ is the mean of .

Kurtosis
Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be the extreme case. [3] When Kurtosis have small values, it shows more concentration, in contrast when the Kurtosis is bigger is more dispersed. Usually the size of positive and negative kurtosis is compared with the normal distribution curve. Positive Kurtosis indicates that the normal distribution curve is more smooth, on the other hand, the negative Kurtosis indicates that the normal distribution is more precipitous.
Kurtosis is unfortunately harder to picture than Skewness, but the illustrations below, should help. All of these three distributions have mean of 0, standard deviation of 1, and Skewness of 0, and all are plotted on the same horizontal and vertical scale. Look at the progression from left to right, as kurtosis increases. [3] Image illustration of Kurtosis [4] Formula:

Texture Parameters
Texture is one of the important characteristics used in identifying objects or regions of interest in an image, texture represents the appearance of the surface and how its elements are distributed. It is considered an important concept in machine vision, in a sense it assists in predicting the feeling of the surface (e.g. smoothness, coarseness …etc.) from image. Various texture analysis approaches tend to represent views of the examined textures form different perspectives, and due to multi-dimensionality of perceived texture, there is not an individual method that can be sufficient for all textures. Therefore, AK software is mainly concerned with texture classification accuracy improvement using textures features statistical based methods.

Energy
This feature Returns the sum of squared elements in the GLCM. Range = [0 1] Energy is 1 for a constant image. Is high when image has very good homogeneity or when pixels are very similar The Property Energy is also known as uniformity, uniformity of energy, and angular second moment. [2] Formula: Where i,j are the spatial coordinates of g (i,j).

Entropy
This is a measure of randomness, having its highest value when the elements of g are all equal. In the case of a checkerboard, the entropy would be low.

Correlation
Correlation measures the linear dependency of grey levels of neighboring pixels, in other words, it measures the similarity of the grey levels in neighboring pixels, tells how correlated a pixel is to its neighbor over the whole image.

Inertia
It reflects the clarity of the image and texture groove depth. The contrast is proportional to the texture groove, high values of the groove produces more clarity, in contrast small values of the groove will result in small contrast and fuzzy image. [2] Formula ∑(( − ) ( , )) ,

Cluster Shade
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a common technique for statistical data analysis.
Cluster Shade in clustered shading, we group similar view samples according to their position and, optionally, normal into clusters.
In AK Software we have the 36 parameters related to Cluster analysis, first we describe the 18 related to Cluster Shade.  Cluster Prominence is a measure of asymmetry of a given distribution, high values of this feature indicate that the symmetry of the image is low, in medical imaging low values of cluster prominence represent a smaller peak for the image grey level value and usually the grey level difference between the forms is small.

Form Factor Parameters
These group of features includes descriptors of the three-dimensional size and shape of the tumor region. Let in the following definitions denote the volume and the surface area of the volume of interest. We determined the following shape and size based features: 3.1 Sphericity:

Surface area:
The surface area is calculated by triangulation (i.e. dividing the surface into connected triangles) and is defined as: The maximum three-dimensional tumor diameter is measured as the largest pairwise Euclidean distance, between voxels on the surface of the tumor volume.
3.6 Spherical disproportion: Where is the radius of a sphere with the same volume as the tumor.
Where is the total number of triangles covering the surface and , and are edge vectors of the triangles.

Surfacetovolumeratio:
= 3.8 Volume: The volume ( ) of the tumor is determined by counting the number of pixels in the tumor region and multiplying this value by the voxel size.
The maximum 3D diameter, surface area and volume provide information on the size of the lesion. Measures of compactness, spherical disproportion, sphericity and the surface to volume ratio describe how spherical, rounded, or elongated the shape of the tumor is.

GLCM Parameters
The Grey level co-occurrence matrix (GLCM) ( , | , ) represents the joint probability of certain sets of pixels having certain grey-level values. It calculates how many times a pixel with grey-level i occurs jointly with another pixel having a grey value j. By varying the displacement vector d between each pair of pixels.
The advantage of the co-occurrence matrix calculations is that the co-occurring pairs of pixels can be spatially related in various orientations with reference to distance and angular spatial relationships, as on considering the relationship between two pixels at a time. As a result, the combination of grey levels and their positions are exhibited apparently. Therefore, it is defined as "A two dimensional histogram of gray levels for pair of pixels, which are separated by a fixed spatial relationship". However, the matrix is sensitive to rotation. With the change of different offsets define pixel relationships by varying directions.
The rotation angle of an offset: 0°, 45°, 90°, 135° and displacement vectors (distance to the neighbor pixel: 1, 2, 3 ...), different co-occurrence distributions from the same image of reference. GLCM of an image is computed using displacement vector d defined by its radius, (distance or count to the next adjacent neighbor preferably is equal to one) and rotational angles.

Energy of GLCM
This feature Returns the sum of squared elements in the GLCM. Range = [0 1] Energy is 1 for a constant image. Is high when image has very good homogeneity or when pixels are very similar The Property Energy is also known as uniformity, uniformity of energy, and angular second moment.

Entropy of GLCM
Entropy is a measure of randomness of intensity image.
Entropy shows the amount of information of the image that is needed for the image compression. Entropy measures the loss of information or message in a transmitted signal and also measures the image information.
In AK Software we have the 18 parameters related to the GLCM Entropy