A CT radiomics analysis of COVID-19-related ground-glass opacities and consolidation: Is it valuable in a differential diagnosis with other atypical pneumonias?

Purpose To evaluate the discrimination of parenchymal lesions between COVID-19 and other atypical pneumonia (AP) by using only radiomics features. Methods In this retrospective study, 301 pneumonic lesions (150 ground-glass opacity [GGO], 52 crazy paving [CP], 99 consolidation) obtained from nonenhanced thorax CT scans of 74 AP (46 male and 28 female; 48.25±13.67 years) and 60 COVID-19 (39 male and 21 female; 48.01±20.38 years) patients were segmented manually by two independent radiologists, and Location, Size, Shape, and First- and Second-order radiomics features were calculated. Results Multiple parameters showed significant differences between AP and COVID-19-related GGOs and consolidations, although only the Range parameter was significantly different for CPs. Models developed by using the Bayesian information criterion (BIC) for the whole group of GGO and consolidation lesions predicted COVID-19 consolidation and AP GGO lesions with low accuracy (46.1% and 60.8%, respectively). Thus, instead of subjective classification, lesions were reclassified according to their skewness into positive skewness group (PSG, 78 AP and 71 COVID-19 lesions) and negative skewness group (NSG, 56 AP and 44 COVID-19 lesions), and group-specific models were created. The best AUC, accuracy, sensitivity, and specificity were respectively 0.774, 75.8%, 74.6%, and 76.9% among the PSG models and 0.907, 83%, 79.5%, and 85.7% for the NSG models. The best PSG model was also better at predicting NSG lesions smaller than 3 mL. Using an algorithm, 80% of COVID-19 and 81.1% of AP patients were correctly predicted. Conclusion During periods of increasing AP, radiomics parameters may provide valuable data for the differential diagnosis of COVID-19.

Introduction where f(x) is the normalized voxel density, s is a scaling factor (set to 1), x is the original density, μ x is the mean density calculated from all voxels of the slice in and outside of the segmentation and σ x is the standard deviation.
We mainly worked with negative HU values and avoided adding a fixed positive integer (voxel array shift) to the measured HU values for the calculation of the total energy parameter, thus avoiding the volume confounding effect. Neighbor distance was set to 1 mm and examined 0˚, 45˚, 90˚and 135˚from the center voxel isotropically (13 directions). For the dependence matrices, neighboring voxels were considered dependent on the center voxel if both were equal in gray level.
Statistical analysis. Lesions were classified as ground-glass opacities (GGOs), crazy paving signs (CPs) or consolidations by the two radiologists (MG and BARM) separately. If there was inconsistency in the classification, a joint decision was made. All pneumonic lesions present in patients were classified, and all classified lesions were used for segmentation, feature extraction and parameter calculation and statistical evaluation.
GGOs (n = 150), CPs (n = 52) and consolidations (n = 99) groups were evaluated separately to assess the differences in the parameters as they related to COVID-19 and AP. Since there were multiple outliers, especially in the parameters correlated strongly with volume, a nonnormal distribution was very often observed in the group comparisons. A logarithmic

PLOS ONE
transformation was performed, and the T-test was used if the transformed data were normally distributed. Otherwise, the Kruskal-Wallis and post hoc Mann-Whitney U tests were used. The First-order parameters were also evaluated for their ability to distinguish between different lesion types.
The assumption of linearity for the parameters in the model was evaluated with the Box-Tidwell procedure. The adequacy of the model parameters in predicting the categorical outcomes was evaluated with the Hosmer-Lemeshow goodness of fit test, and p>0.05 was evaluated as a good fit. Multicollinearity of the parameters in the models (such as between Range and Interquartile range or Flatness and Spherical Disproportion) was evaluated by the variance inflation factor (VIF) of the linear regression test.
Model validation was performed using leave one out cross-validation (LOOCV). For this purpose, all lesions belonging to the same patient were turned into a separate block and formed the test group, and the remaining lesions comprised the training group. Thus, validation was conducted 134 times for the GGO and consolidation group, 86 times for the positive skewness group (PSG) and 68 times for the negative skewness group (NSG).

Pulmonary lesions
There were 153 (50.8%) COVID-19 (1-6 lesions/patient; mean 2.55) and 148 (49.2%) AP lesions (1-4 lesions/patient; mean 2.00) obtained from the study population (Table 1). The limited number of segmented lesions used in the study was based on two reasons: (1) Lesions less than 1 mL in volume were not used in the study, and (2) Lesions that merged with each other were processed as a single lesion. No significant difference was found in the number of GGO lesions between the COVID-19 and AP groups (p = 0.721, post hoc z-score test), although there was a significant difference in the number of CP (p = 0.000) and consolidation (p = 0.003) lesions.
Two authors (MG and BARM) segmented lesions separately using the same rules. Mean volume (26.855±34.445 mL and 28.211±37.092 mL, respectively) and mean density (0.590 ±0.162 and 0.579±173) of the segmentations were compared using Bland Altman analysis and paired samples T-test and no significant difference was found for calculated mean volume (p = 0.589) and density (p = 0.154).

Shapes, sizes and locations of the lesions
The volumes of the lesions were spread over a wide range (1.235-165.504 mL in the COVID-19 group and 1.022-176.192 mL in the AP group) with nonnormal distributions and wide variability (mean 23.565 ± 32.181 mL in the COVID-19 group and 30.756 ± 36.696 mL in the AP group). However, there was no significant difference in the lesion volume between the COVID-19 and AP groups (p = 0.084, Kruskal-Wallis test).
The Shape parameters Sphericity, Compactness, Spherical Disproportion, Elongation and Flatness demonstrated a tendency of the COVID-19 lesions to be more rounded (Table 3). Receiver operating characteristic (ROC) analysis showed that the Shape and Size parameters individually had poor sensitivity, specificity and AUC values in discriminating lesions (Table 3).
In the COVID-19 group 105 lesions (68.6%) were located peripherally, 27 lesions (17.7%) were located centrally and 21 lesions (13.7%) were located diffusely; in the AP group, these numbers were 60 (40.5%), 27 (18.2%) and 61 (41.2%), respectively. The number of lesions of each location type were significantly different between the groups (p = 0.000, chi-square test). In our study group, the cause of the pneumonic lesions was correctly predicted for 64.5% of the lesions (sensitivity 60.1% and specificity 68.6%) by using their location data (peripheral or nonperipheral) only.

First-order texture parameters
The First-order texture parameters' discriminability of the whole group of COVID-19 and AP lesions was poor (Table 2); however, these parameters were found to be effective in categorizing pneumonic lesions alone (Table 3). While consolidations yielded negative skewness unless they had extensive ground-glass halo areas (Fig 2), GGO lesions with a volume greater than 3.0 mL had positive skewness values. GGO lesions smaller than this volume showed negative skewness, and skewness maps showed that voxels corresponding to enlarged septal or vascular structures led to a right shift (Fig 3). The mean skewness was found to be significantly different among GGO, CP and consolidation lesions.
Several parameters were found to be significantly different for GGO and consolidation lesions between COVID-19 and AP (Table 3), and Range was the only parameter that could

PLOS ONE
discriminate all lesion types in both disease groups and had the best AUC in ROC analysis, although its specificity was 45.5%.

Second-order texture parameters
The Second-order texture parameters' discriminability of the COVID-19 and AP lesions was also poor (Table 4). Only the parameters Large Area Low Gray Level Emphasis and GLCM-Correlation had AUCs greater than 0.600, although their sensitivity in differentiating COVID-19 lesions was merely 50%. None of the Second-order parameters showed a significant difference for the CP lesions between the COVID-19 and AP groups. However, there were parameters significantly different in GGO and consolidation (Table 5).

Models for lesion estimation
Logistic regression probabilistic models were developed since no individual parameter showed good discriminability. A total of 18 First and Second-order parameters discriminated both GGO and consolidation (Tables 3 and 5) lesions, and no parameters other than Range could discriminate CP lesions between COVID-19 and AP. Thus, models focusing on GGOs and consolidations were built. These 18 parameters were merged with the Shape and Size parameters with AUCs greater than 0.600 (Table 2) as well as the Location parameter; thus, a total of 23 parameters were used to generate the models. The parameters were logarithmically transformed to prevent the model from being influenced by outliers and the skewness values of the parameters. All possible three-and four-parameter combinations were studied for candidate models. No more than four parameters were used to build the models to prevent overfitting. The Bayesian information criterion (BIC) estimator was used to select the best among the candidate models The group consisting of all consolidation and GGO lesions was the largest group and included 134 patients and 249 lesions; the best three-and four-parameter models that predicted both types of lesion showed modest sensitivity and specificity for both the training and test sets (Models 1 and 2, Table 6). Further evaluation of the subgroups revealed that the accuracy for COVID-19 consolidations were 46.1% with Model-1 and 56.4% with Model-2.

Parameter Pneumonic Lesion Comparison Comparison of COVID-19 and Atypical
Similarly, both models had low accuracy for AP-related GGO lesions (60.8% and 54.1%, respectively). The high accuracy in predicting AP-related consolidation lesions (90% and 83.3%) and COVID-19-related GGO lesions (82.9% and 81.5%), which both constituted 54.6% of the lesions, appeared the models more successful than they actually were. Since low accuracy affected the consolidation and GGO subgroups in the single model approaches, we decided to study them separately. In our study, there were only a few pure consolidations (7 COVID-19, 6 influenza, 6 adenovirus and 1 Legionella pneumophila-associated lesions), and almost all lesions were including both ground-glass and consolidation areas which posed a classification problem. We concluded that separating the lesions according to their skewness values would eliminate the need for a subjective decision-maker; thus, the lesions were grouped into the PSG (n = 149) and NSG (n = 100) according to their skewness values.
The PSG included 78 lesions (52.3%) from 49 patients with AP and 71 (47.7%) lesions from 37 COVID-19 patients. While 142 of these lesions were GGOs, 7 were consolidations with wide ground-glass halo. The best models for PSG lesion prediction always included the parameters GLCM-Contrast and Range. The BIC analysis showed that the best 3-parameter model was obtained by adding Sphericity (Model-3, Table 6). Higher values of the parameters GLCM-Contrast and Range increased the likelihood of the lesion being identified as an AP lesion; in contrast, a higher value for the Sphericity parameter increased the likelihood of the lesion being identified as a COVID-19 lesion. There were 20 COVID-19 and 11 AP lesions with the Sphericity value was greater than 0.500, and Model-3 correctly predicted 17 (85.0%) COVID-19 and 8 (72.7%) AP lesions, showing that the model had no tendency to classify the most rounded lesions as COVID-19. In the cross-validation study, Model-3 had the best sensitivity, specificity and accuracy for PSG lesions (Table 6).
When the parameters evaluating the shape of the lesion were not used during model creation, the best model included the Lesion location parameter (Model-4, Table 6). Such models  The inclusion of Sphericity and lesion location to build a four-parameter model yielded a model with slightly lower sensitivity, specificity and accuracy (Model-5, Table 6). This model correctly predicted 48.6% peripherally located AP lesions, similar to Model-4 (case-by-case estimates were not exactly the same), and 82.7% peripheral COVID-19 lesions.
The best 4-parameter model according to the BIC analysis included the Interquartile range and Lesion location parameters (Model-6, Table 6). This model had the same sensitivity with Model-3 (case-by-case estimates were not exactly the same) and the second-best specificity and accuracy after Model-3. The VIF for Interquartile range and Range was calculated as 1.006.
Five out of 7 consolidations with wide ground-glass halos (5 AP and 2 COVID-19) were correctly predicted and the same 1 AP and 1 COVID-19 lesions could not be correctly predicted by all of the positive skewness models.
As a result, the best accuracy was achieved with Model-3, and the score for a lesion was calculated as: The NSG included 56 (56%) AP lesions in 39 patients and 44 (44%) COVID-19 lesions in 29 patients. The NSG was primarily composed of consolidations (92 lesions). There were 8 GGO The models for NSG prediction did not include Shape or Location parameters. Two texture parameters, GLCM-Inverse Difference Normalized (IDN) and the First-order parameter Mean Absolute Deviation (MAD), formed the basis of NSG estimation.
In the BIC analysis, the lowest-scoring 3-parameter model was formed by adding Spherical Disproportion to the above 2-parameter model (Model-7, Table 6) and the lowest-scoring 4-parameter model was formed by adding the Flatness parameter to this 3-parameter model (Model-8, Table 6). However, IDN and MAD were the only statistically significant parameters in the models. When the Spherical Disproportion parameter was changed with Sphericity (p = 0.072) or Lesion location (p = 0.386), these parameters were also unable to create a Accordingly, the NSG score for Model-7 is depicted in Eq 3: NSG Score ¼ 1 1 þ e À ðÀ 24;338À ð5;290�log 10 ðSpherical DisproportionÞþ18;715�log 10 ðMADÞþ276;037�log 10 ðIDNÞÞ ð3Þ Five out of 8 GGO lesions smaller than 3 mL that produced negative skewness values were correctly predicted by the NSG models. On the other hand, Model 3 of PSG was able to accurately predict all lesions. The Sphericity of the AP GGO lesion (from a patient diagnosed with RSV) was 0.362, while the COVID-19-related GGO lesions had a Sphericity between 0.550-0.669. The net benefit provided by the highest accuracy PSG and NSG models (Model-3 and Model-7, respectively) was evaluated with decision curve analysis, using all possible threshold probabilities. While the PSG model did not differ from an approach in which all lesions were evaluated as COVID-19 for low threshold probabilities, it had a higher net benefit for the intermediate and high threshold (0.21-0.82) probability range (Fig 4A). On the other hand, the NSG model provided higher net benefit at all threshold probabilities (Fig 4B).

Case-by-case evaluation
The results of Model-3 and Model-7, which had the highest accuracies in the cross-validation, were evaluated on a case-by-case basis. Regardless of the number of segmentations, the model was considered unsuccessful for a patient who had one falsely predicted lesion. In the PSG, 10 AP (2 single and 8 multiple lesions) and 13 COVID-19 (10 single and 3 multiple lesions) patients were falsely predicted by Model-3. The remaining 39 AP (79.6%) and 24 COVID-19 patients (64.9%) were correctly predicted.

PLOS ONE
In contrast, 1 AP and 4 COVID-19 patients presented with a total of 8 GGO lesions with a volume less than 3 mL. The NSG model incorrectly predicted the AP patient and 1 COVID-19 patient, while PSG correctly predicted them all.
According to these results, we reached two basic rules and a simple algorithm for patient evaluation: (1) If a patient has both PSG and NSG lesions, the prediction should be made using NSG lesion(s) and NSG model; and (2) A lesion with a volume of less than 3 mL should be evaluated with the PSG model. Using an algorithm based on these rules, our final accuracy was 80% for COVID-19 and 81.1% for AP (Fig 5).

Power analysis
An a priori power analysis was performed for independent samples t tests to determine the sample size of the COVID-19 and AP groups. The parameters used for this purpose were two tails, Cohen's d = 0.5, alpha = 0.05 and targeted power = 0.80, and the total sample size was calculated as 128.
During the study, the COVID-19 group consisted of 60 cases and the AP group consisted of 74 cases (total n = 134) and the power was calculated as 0.82 with post hoc analysis.

Discussion
A total of 89 radiomic parameters, including Lesion location, Size, Shape, First-order and Second-order texture parameters, were evaluated, and none could individually differentiate COVID-19 from AP with sufficient sensitivity and specificity. For this reason, a method based on the estimation of lesions was adopted by creating models with logistic regression analysis. Although 24 parameters were significantly different between COVID-19 and AP-related GGO and consolidations, none of the parameters, except Range, could differentiate CP lesions between the two disease groups. Thus, we focused on creating models that predicted GGOs and consolidations.
In our study, a one-model-for-all-lesions approach resulted in false estimates in the GGO and consolidation subgroups. Fang et al. reported a single model that could differentiate COVID-19 and influenza with high AUCs [22]. However, their model was not composed of radiomic parameters alone but also included parameters such as mediastinal lymphadenopathy and pleural effusion that are rarely reported for COVID-19 [2].
The models that predicted PSG (predominantly GGO) lesions always included the parameters GLCM-Contrast and Range. GLCM-Contrast represents the local gray-level variations within the lesion, and wrinkled images or images with edges have high values [23]. Range is the difference between the highest and lowest voxel densities in the lesion. As the values of these two parameters increased, the possibility of the model classifying the lesion as an AP lesion also increased. The incorporation of these two parameters allowed the models to predict round or peripheral AP lesions with good specificity. The Expert Consensus Statement on COVID-19 reporting describes a typical COVID-19 lesion as a peripheral, bilateral, round GGO lesion [2]. Our study results for GGO lesions are consistent with the statement, as we found that Sphericity and Lesion location were prominent parameters. In the training and validation sets, the model with the highest accuracy contained the Sphericity parameter. On the other hand, models with standardized parameters showed that the Lesion location parameter had the highest odds ratio. Coronaviruses other than SARS-CoV-2 and influenza virus lead to peripheral involvement more often than other AP viruses [2,[24][25][26]. In our study, the proportion of peripheral lesions in the AP group (excluding CP lesions) was 47%, and the models with the Lesion location parameter misclassified AP lesions slightly more often than the model with the Sphericity parameter. Although successful on the positive class side (COVID-19), the low number of true negatives (AP) explains the low accuracy achieved in the location-based models despite their high AUCs.
In the NSG (predominantly consolidations), neither shape nor location had a significant effect on the models' performance. It has been reported that as COVID-19 progresses, round GGO lesions tend to evolve into patchy GGO lesions and consolidations [26]. Consolidation has been reported in up to 64% of influenza-related pneumonias [25,27], and lobar and segmental consolidations are known to develop in pulmonary infections associated with influenza virus, adenovirus and human coronaviruses other than SARS-CoV-2 [28]. The NSG models were based on the parameters IDN and MAD. IDN measures local homogeneity, and larger values indicate a more homogeneous texture on a local scale [23]. The parameter MAD measures the distribution of voxels, and after the parameters SD, Range and IR, another parameter evaluating the voxel distribution was included in a model. It was found that despite the wider gray scale distribution of the voxels of the AP-related consolidations, as we also found for the GGOs, their local homogeneities were also greater than those of COVID-19-related consolidations.
Positively skewed COVID-19 GGO lesions with a volume of less than 3 mL were distinctly spherical, while one AP GGO lesion had a low sphericity. Thus, while NSG models were not successful for small volume lesions, Model 3 accurately predicted all lesions. Although spherical lesions in measles and varicella-zoster virus-related pneumonia have already been described [28], the shape-related features of small AP lesions should be investigated in future studies.
In this study, no parameter other than a radiomic feature was included in the models. Additionally, AP patients with tree-in-bud and pleural effusion were not included in the study; thus, the discriminability between same-category COVID-19 pneumonia and AP-associated lesions was investigated. The NSG models, which consisted mostly of consolidations, showed higher accuracy than the PSG models, which included mainly GGO lesions. It was seen that a higher net benefit could be obtained through our models according to a theoretical condition in which all lesions were evaluated as COVID-19 or AP lesions. Moreover, with the algorithm described in our study, an accuracy of 80% was achieved for both the COVID-19 pneumonia and AP groups without using any data other than radiomic parameters.
Reproducibility is the main problem of radiomics studies [16]. Although there are suggested methods to compensate for device and protocol-related differences [29], the images were obtained from the same device and protocol in our study. Additionally, the voxel densities were implemented as normalized values, not directly as Hounsfield units. Although some software can discriminate healthy and diseased parenchymal areas at the lung scale for processing all individual lesions as one large composite lesion [20], predicting different lesion types with a single model led to false negativity in our study. The lesions were manually segmented with simple rules, and we showed that there was no significant difference between the segmentations of the different observers.
This study has some limitations. First, there were a few serologically diagnosed AP patients in the retrospective screening. Additionally, the mean number of lesions detected per patient was lower than that among COVID-19 patients. Since it is necessary to work with balanced groups to demonstrate the effectiveness of the models, the largest possible AP group was created, and then the number of COVID-19 patients required was determined according to the results of a power analysis. Thus, our sample size was relatively small. Second, the time between the onset of symptoms and the CT scan was slightly longer in the AP group, and the number of consolidations recorded in this group was also higher. Since each patient's CT examination obtained prior to antiviral treatment was included in the study, any follow-up films were not used. Finally, an effective model for CP lesions could not be developed with the methods that we used to calculate the radiomics features. In the future, we aim to develop efficient models by using series containing more AP-associated CP lesions and different calculation methods.
In conclusion, using lesion-dedicated models consisting of only radiomics parameters and an algorithm that combined the appropriate lesion type for the correct model, we showed that COVID-19-and AP-associated GGO lesions and consolidations could be predicted with good accuracy. Our validation studies showed that roundness and peripheral location were the strongest parameters for associating a GGO lesion with COVID-19, although both were found ineffective in predicting a lesion in the consolidation stage.