Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Computer-based quantitative image texture analysis using multi-collinearity diagnosis in chest X-ray images

  • Antonio Quintero-Rincón ,

    Contributed equally to this work with: Antonio Quintero-Rincón, Ricardo Di-Pasquale, Karina Quintero-Rodríguez, Hadj Batatia

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    antonioquintero@uca.edu.ar

    Affiliations Department of Data Science, Data Science and AI Laboratory, Catholic University of Argentina (UCA), Buenos Aires, Argentina, Department of Computer Sciences, Catholic University of Argentina (UCA), Buenos Aires, Argentina

  • Ricardo Di-Pasquale ,

    Contributed equally to this work with: Antonio Quintero-Rincón, Ricardo Di-Pasquale, Karina Quintero-Rodríguez, Hadj Batatia

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Department of Data Science, Data Science and AI Laboratory, Catholic University of Argentina (UCA), Buenos Aires, Argentina

  • Karina Quintero-Rodríguez ,

    Contributed equally to this work with: Antonio Quintero-Rincón, Ricardo Di-Pasquale, Karina Quintero-Rodríguez, Hadj Batatia

    Roles Conceptualization, Methodology, Validation, Writing – review & editing

    Affiliation “Prof. Dr. Juan P. Garrahan” Pediatric Hospital, Medical Image Department, Buenos Aires, Argentina

  • Hadj Batatia

    Contributed equally to this work with: Antonio Quintero-Rincón, Ricardo Di-Pasquale, Karina Quintero-Rodríguez, Hadj Batatia

    Roles Conceptualization, Formal analysis, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation MACS School, Heriot-Watt University, Dubai, United Arab Emirates

Abstract

Despite tremendous efforts devoted to the area, image texture analysis is still an open research field. This paper presents an algorithm and experimental results demonstrating the feasibility of developing automated tools to detect abnormal X-ray images based on tissue attenuation. Specifically, this work proposes using the variability characterised by singular values and conditional indices extracted from the singular value decomposition (SVD) as image texture features. In addition, the paper introduces a “tuning weight" parameter to consider the variability of the X-ray attenuation in tissues affected by pathologies. This weight is estimated using the coefficient of variation of the minimum covariance determinant from the bandwidth yielded by the non-parametric distribution of variance-decomposition proportions of the SVD. When multiplied by the two features (singular values and conditional indices), this single parameter acts as a tuning weight, reducing misclassification and improving the classic performance metrics, such as true positive rate, false negative rate, positive predictive values, false discovery rate, area-under-curve, accuracy rate, and total cost. The proposed method implements an ensemble bagged trees classification model to classify X-ray chest images as COVID-19, viral pneumonia, lung opacity, or normal. It was tested using a challenging, imbalanced chest X-ray public dataset. The results show an accuracy of 88% without applying the tuning weight and 99% with its application. The proposed method outperforms state-of-the-art methods, as attested by all performance metrics.

1 Introduction

Image texture analysis, quantification, and recognition are active research topics in biomedical imaging, computer vision, and pattern recognition. In the biomedical context, texture arises from the micro-and-macro-structural patterns of biological tissues [1]. Physicians are trained to visually interpret texture information across various imaging modalities, such as radiographic X-rays. The principle behind these anatomical images is based on the differences in attenuation among tissues, which are influenced by the material’s atomic number, tissue density, photon energy, and material thickness. Greater tissue density leads to increased attenuation. Chest X-rays are specifically employed to assist physicians in examining the anatomy of the lungs and heart. Pixel intensities correspond to the density of matter integrated along rays, often analysed according to their texture. Tissue attenuation in X-rays has extensively been studied in medical applications. Existing methods use various techniques to improve low contrast and low dynamic ranges to help discriminate tissues and precisely identify organs, bones, tumours, and nodules. Recent studies have focused on methods to remove [2] or amplify [3] tissue components, using multiscale Shannon-Cosine Wavelet models [4] or by adjusting parametric models based on the component attenuation, contrast, and image fusion [5]. Deep convolutional neural networks (CNN), such as VGG, ResNet, DenseNet, and DeTraC, have been applied directly [69], or as means of representation learning combined with conventional machine learning models [10]. Other deep learning methods have been proposed, including the triplet-constrained deep hashing [11], vision-transformer [12], the dual-ended multiple attention learning models (DMAL) [13], the centralised and federated learning [14], and Wasserstein distance and discrepancy metric [15].

In high-dimensional data analysis, such as image processing or computer vision, features can be correlated, making learning difficult. Various dimensionality reduction techniques exist, including subspace, manifold-based, and shallow and deep neural network methods. Principal Component Analysis (PCA) is a dimensionality reduction technique that belongs to subspace-based methods [16]. PCA aims to project the data onto fewer dimensions while preserving its inherent statistical patterns. Technically, PCA identifies the components along which the data matrix has the maximum variance. It has applications in data exploration, noise reduction, feature extraction, and data compression.

The most straightforward method to calculate PCA is through the conventional eigenvalue decomposition of the covariance matrix. However, Singular Value Decomposition (SVD) is the standard method used to prevent computational and numerical issues. SVD is a matrix factorization technique that decomposes a complex matrix X into the product of three matrices , where U and V are orthogonal matrices containing the left and right singular vectors, respectively, and Σ is a diagonal matrix with singular values indicating the importance of each component. Principal components are extracted by selecting the desired number (k) of singular values and the corresponding columns of V (). The lower-dimensional representation of X can then be computed . For more details about PCA and SVD, we refer the reader to [1719].

This work proposes analysing the variability of tissue attenuation as a texture phenomenon in X-ray images using singular value decomposition (SVD). Singular values and conditional indices are proposed as textural features used in a multiclass learning model to classify clinical chest X-ray images as normal, COVID-19, viral pneumonia, or lung opacity. Note that the conditional indices of a matrix are derived using the well-known SVD method [20]. They are usually used in regression methods to diagnose multi-collinearity problems [21]. They have never been used as characteristics of texture.

SVD is commonly used in image texture analysis, where the spatial arrangement of grayscale pixels in a neighbourhood is considered to characterize phenomena present in the image. This technique is used to solve problems related to segmentation, classification, and synthesis by using statistical, structural, model-based, and transform-based methods. These methods extract textural properties to describe image texels. Texels are texture units arranged in ways that can be characterized by specific feature descriptors, which in turn use texture operators [1]. The most frequently used texture descriptors relate to coarseness, homogeneity, density, neness, smoothness, linearity, directionality, granularity, and frequency [22]. Table 1 summarises the most popular textural operators.

For experimentation, singular values and conditional indices were used to analyse the variability of tissue attenuation of X-rays in cases of COVID-19, viral pneumonia, lung opacity, and normal. The aim is to derive that the two features characterise this variability well and can be used to discriminate against COVID-19 cases.

Coronavirus disease 19 (COVID-19) is caused by a severe acute respiratory syndrome called coronavirus 2 (SARS-CoV-2) infection. Infected individuals have been reported with typical clinical symptoms involving fever, non-productive cough, myalgic, shortness of breath, and normal or decreased leukocyte count. Severe cases of infection cause pneumonia, severe acute respiratory syndrome, multi-organic failure, and death [23]. It is well-known that false-positive diagnoses often lead to more expensive follow-up tests and patient anxiety, while false-negative diagnoses may result in death if treatable conditions are not identified. It is essential to know that chest X-rays are a crucial tool for diagnosing lung infections in the medical field. The real challenge is identifying the diagnosis when an opacity is seen on the X-ray. Such an opacity could indicate bacterial pneumonia, viral pneumonia, COVID-19, or other causes of opacities (including pulmonary embolism, pulmonary oedema, pleural effusion, or lung cancer). Radiologists and pulmonologists use various imaging features to differentiate these conditions, but with generally variable results. The problem can be detected, but the precise nature of the issue would remain unclear, requiring follow-up steps such as computed tomography (CT) scans, sputum analysis, comparison with medical records, concordance with clinical symptoms, bronchoscopy, and biopsy.

It is well known that experts in biomedical imaging may have varied interpretations and potential errors. For example, symptoms of COVID-19 are very similar to viral pneumonia, potentially leading to misdiagnosis. This ambiguity is caused by the much more significant variations that occur during the texture mapping process when clinicians establish links between the visual observation of image patterns and the underlying cellular and molecular structures. These variations are partly due to the diversity of human biology, anatomy, image acquisition and reconstruction protocols, compounded by observer training. Therefore, early diagnosis using chest X-ray imaging can be crucial to avoid false diagnoses and delays in treatment, which can lead to additional costs, effort, and risks.

Typical abnormal findings of COVID-19 pneumonia reported that, in chest X-ray, parts of the lungs appear as “normal well-aerated parenchyma”, which is associated with areas of accentuation of the pulmonary interstitium characterised by fine linear structures representing the foci of pneumonia, in addition to the so-called ground-glass opacities, typical of COVID-19 infection. Since these findings are among the first radiological manifestations of COVID-19 pneumonia, one can hypothesise that an accurate X-ray reading could help in the early/initial diagnosis of COVID-19 pneumonia on a routine basis, also providing the differential diagnosis with other non-COVID-19 pneumonias [24]. The progress of the disease course in some patients can be relatively rapid or plodding. Usually, the most significant involvement in COVID-19 is mainly perceived in the lower lung lobes, which evolve late into areas of pulmonary consolidation, predominantly peripheral, cloudy lung-like with bilateral pleural effusion [25]. Although COVID-19 pneumonia and other viral cases of pneumonia share similar radiographic findings, it was found typical in non-COVID-19 viral lung infections that the accentuation of the pulmonary interstitium is often at the bilateral parahilar level with a progressive extension towards the periphery.

Although chest radiography is not sensitive enough to detect ground glass opacity, which is the primary imaging feature of COVID-19 pneumonia, chest X-ray imaging is the preferred method for follow-up COVID-19 pneumonia patients admitted to intensive care units [26]. Computed tomography (CT) is another imaging method that provides better accuracy and details. However, it implies a higher radiation dose: while a standard chest X-ray has a 0.02mSv effective dose (in adults), a CT chest study has a 7mSv effective dose [27]. Furthermore, it is not recommended that a chest CT scan be performed on frail patients in intensive care. Chest X-ray allows physicians to easily estimate the extent of alveolar opacity caused by infection, with the lowest radiation rate. When interpreting chest X-rays, it is essential to recognise particular clutter, artefacts, and ambiguities that can make diagnosing pneumonia difficult, especially when using artificial intelligence (AI) tools for automated detection. Motion artefacts, for example, can create pseudo-opacity due to the superposition of anatomical structures, particularly at the lung bases, such as the diaphragm. In addition, other artefacts are produced by the structures of the chest when X-rays are passed through. A typical case is the starburst effect around the coastal arches, generated by the high bone density of these structures. Correct identification and management of these artefacts are essential to avoid false positives and optimise the accuracy of AI systems in detecting pneumonia.

Therefore, addressing the texture analysis based on tissue attenuation in chest X-ray images is complex and challenging. This work starts from the reasonable assumption that COVID-19, viral pneumonia, and lung opacity have tissues that exhibit a significantly different attenuation of X-rays compared to normal cases. More precisely, the hypothesis is that the relation between the singular values and the conditional indices of the image characterises this difference. These two quantities can, therefore, be used as texture features to classify COVID-19, viral pneumonia, lung opacity, and normal X-ray images. Note that this approach fits within the category of transform-based methods, see Table 1.

The main contribution of this work is an original approach to image texture analysis using singular values and conditional indices. The underlying idea is to show that the phenomenon of near-collinearity [21] characterises the variability of attenuation in X-ray images. As an application, we design an algorithm for quantifying the variability of tissue attenuation in chest X-ray images. The algorithm estimates the proportions of the singular values and the condition indices. These are used as features to classify images as COVID-19, viral pneumonia, lung opacity, or expected. To our knowledge, this approach has not been investigated before for texture analysis in chest X-ray images. In addition, a tuning weight parameter derived from the variance-decomposition proportions is proposed to improve the multiclass classification. Precisely, the estimation of this parameter relies on the coefficient of variation derived from the minimum covariance determinant. The determinant is calculated using the bandwidth parameter h, which comes from the non-parametric empirical distribution of the variance-decomposition proportions. Singular values and conditional indices are multiplied by this weight to account for the attenuation variability. We will show experimentally that this tuning mechanism improves the detection of respiratory syndromes.

The remainder of the paper is organised as follows. Sect 2 presents the experimental dataset (Sect 2.1) and the proposed method (Sect 2.2), where, i) we review singular value decomposition (Sect 2.2.1), ii) introduce the Likelihood-based Scree plots (Sect 2.2.2, iii) present the non-parametric empirical distribution (Sect 2.2.3), iv) propose the coefficient of variation from the minimum covariance determinant (Sect 2.2.4), v) describe the derived features (Sect 2.2.5), vi) present the ReliefF feature selection algorithm (Sect 2.2.6), vii) describe the multiclass classification scheme (Sec. 2.2.7), and finally, vii) review the partial dependence analysis method (Sec.2.2.8). Experimentation using chest X-ray images is presented in Sect 3, and the results are discussed. Finally, Sect 4 draws conclusions and future works.

2 Material and methods

2.1 Database

The Kaggle public database COVID-19 Radiography Dataset [28] was considered for experimentation purposes. The dataset consists of chest X-ray images with 3616 COVID-19 positive cases, 6012 lung opacity cases (non-COVID lung infection), 1345 viral pneumonia cases (non-COVID infection) along with 10192 normal cases. It was compiled from Qatar University - Doha, the University of Dhaka - Bangladesh, and their collaborators from Pakistan and Malaysia. Figs 1 and 2 show image sets of COVID-19, pneumonia, lung opacity, and healthy patients. See [29,30] for more information on the complete dataset.

thumbnail
Fig 1. Examples of images from best-case scenarios of the four categories in the dataset: (a) Normal chest; (b) COVID-19; (c) Non-COVID-19 viral pneumonia; (d) Lung opacity.

https://doi.org/10.1371/journal.pone.0320706.g001

thumbnail
Fig 2. Examples of images from worst-case scenarios of the four categories in the dataset: (a) Normal chest; (b) COVID-19; (c) Non-COVID-19 viral pneumonia; (d) Lung opacity.

https://doi.org/10.1371/journal.pone.0320706.g002

This dataset has many challenging characteristics. For example, in some cases, the images seem to be either viral pneumonia, bronchitis, or normal. The issue arises from the fact that the lung conditions of COVID-19 patients are particularly severe and pronounced, as shown in Figs 1 and 2. For instance, in Fig 1(a), there is a typical case showing no infiltrates or pulmonary opacities, with adequate aeration of both lung fields. Fig 1(b) shows a suspicious COVID-19, with an alveolar-interstitial pattern in the upper, middle, and lower fields (predominantly left side). In Fig 1(c), there is a viral pneumonia suggestive image (non-covid) with a diffuse bilateral parahilar interstitial pattern. Fig 1(d) shows a non-pneumonia opacity, with slight accentuation of the bilateral parahilar peribronchovascular interstitial network predominantly on the right. These are the best-case scenarios in their respective classifications, indicating the severity of the COVID-19 cases.

Similarly, Fig 2(a) shows a normal adult chest X-ray. The right perihilar reinforcement appears normal, caused by the pulmonary hilum being more exposed than the contralateral left hilum. In Fig 2(b), a patient is lying down for a chest posteroanterior (PA) view due to insufficient breathing, with numerous monitoring leads attached. In both lung fields, ground-glass interstitial infiltrates are observed at the vertices and bases that are usually linked to COVID-19. Fig 2(c) shows a standing patient for a chest posteroanterior (PA) view with relatively clean lungs, while a right parahilar interstitial infiltrate is present, representing indirect signs of air trapping. In Fig 2(d), a lack of aeration can be seen in both lung fields, where it is impossible to specify the predominant pattern/infiltrate or a particular aetiology. These represent the worst-case scenarios in their respective cases, suggesting that the normal cases are less severe. In essence, the dataset displays a clear difference in disease severity between COVID-19, viral pneumonia, lung opacity, and normal chest X-ray images.

2.2 Methodology

This work aims to find an original interpretable method to quantify the variability of tissue attenuation in chest X-ray images. The underlying idea is to assess the near dependencies among the image columns by calculating the large conditional indices. As mentioned, chest X-ray images of COVID-19, viral pneumonia, lung opacity, and normal cases are used for experimentation. For this purpose, a method composed of three stages is proposed (Fig 3). The first stage consists of applying the singular value decomposition to the X-ray image to obtain three parameters, namely the singular values (ζ), the conditional indices (η), and the variance-decomposition proportions matrix (Σ). In the second stage, a non-parametric empirical distribution is used as a dimension reduction of the variance-decomposition proportions matrix Σ. Next, the coefficient of variation based on the minimum covariance determinant is calculated from the bandwidth vector yielding a single weight. According to our hypothesis (see Introduction 1), this parameter represents tissue attenuation that leads to a near-collinearity phenomenon. It is used to tune the weight of the features from stage one. More precisely, the singular values and the conditional indices are multiplied by this weight to obtain our final feature vector. In the final stage, a classification scheme is used with this feature vector to distinguish COVID-19, viral pneumonia, lung opacity, and normal cases. These three stages are detailed in the following sections.

thumbnail
Fig 3. Block diagram of the proposed method illustrating the processing workflow.

https://doi.org/10.1371/journal.pone.0320706.g003

2.2.1 Singular value decomposition.

Let be the known chest X-ray image matrix, where N is the number of rows considered here as the number of observations, and K is the number of columns viewed as the number of variables. By using the classical singular value decomposition (SVD) [31], X can be expressed as , where , , , and include the non-negative diagonal elements, representing the sorted singular values of X.

(1)

with being the largest singular value, and . The conditional indices identify the number and strength of any near dependencies between variables and are given by

(2)

where , and . Note that the larger is, the stronger the corresponding near-linear dependence.

The variance-decomposition proportions matrix, related to V and S, is given by

(3)

where is a given variance parameter, and .

Computing the SVD decomposition is the main computational cost component in the proposed method. When applied to large dimension  ( M × N )  images, direct SVD decomposition is costly given its O ( MNmin ⁡  ( M , N ) )  complexity. In this work, we used the augmented Lanczos bidiagonalization algorithm [32], implemented in the function svds in MATLAB ®. This algorithm computes the first C singular values with a complexity of O ( MNC ) , hence reducing the complexity.

2.2.2 Likelihood-based Scree plots.

The scree plot [33] is a graphical representation of the eigenvalues against the component number, used to estimate the appropriate number of principal components to retain based on the elbow or break within its graph. Because the eigenvalues decrease monotonically from the first to the last value, a breaking point is created, though not in all cases, which suggests that the waveform begins to level off, see Fig 4. Note that this break or elbow allows choosing the number of significant components, and its dimensions can be estimated automatically using the profile likelihood function [34]. Components that appear before the break or elbow are assumed to be significant and are retained for data interpretation. In contrast, the components that appear after the break or elbow are assumed unimportant and are therefore not retained. Scree plots are helpful when there is an apparent significant deviation in the variation explained by the components [35]. For a comprehensive treatment of the scree plot, read [36,37].

thumbnail
Fig 4. Example of a Scree plot of singular values, highlighting an elbow at the second component.

https://doi.org/10.1371/journal.pone.0320706.g004

The profile likelihood function is defined as follows. Let the p-dimensional vector that contains the ordered measurement of importance values or significant components, described as . The problem is determining how many components to retain according to the break or elbow on the waveform. Then, for a fixed number 1 ≤ q ≤ p, if a break or elbow exists at position q, , and are defined as independent samples from two different distributions, called and , then the log-likelihood function under the naïve independence assumption can be written as

(4)

For any given q, the maximum likelihood estimate of and can be obtained separately from and . By plugging these estimates into Eq (4), a profile log-likelihood for q yields:

(5)

Computing in Eq (5), q can be estimated by maximizing the profile log-likelihood as

(6)

2.2.3 Non-parametric empirical distribution.

At this stage, we consider the distribution of the variance-decomposition proportions (i.e., the elements of Σ). This matrix is reorganised into a vector . Let be the density estimated from observed data x,

(7)

where are considered random samples from the unknown distribution, K ( . )  is the kernel smoothing function, and h is the smoothing parameter or bandwidth that controls the variance of the kernel.

A Gaussian kernel K ≈ N ( 0 , h ) , with zero mean and standard deviation h, was used to build a smoothing function to represent the probability distribution of the input data.

The smoothing parameter, or bandwidth h, determines how the probability associated with each observation [38] (i.e., in our case, proportion) is spread over the surrounding sample space. Assuming that f is a normal density function, then h can be expressed as

(8)

where can be estimated as follows [38],

(9)

with mad denoting the median absolute deviation of the sample.

2.2.4 Coefficient of variation based on the minimum covariance determinant.

Let the dataset matrix, where n stands for the number of observations and p for the number of variables. The minimum covariance determinant (MCD) algorithm is a (uni)multi-variate location and scatter faster-robust-estimator [39]. The underlying idea of MCD is to find h observations out of n, such that , whose covariance matrix has the lowest possible determinant, therefore its mean (μ) and covariance (Σ) are given by

The parameter , called the consistency factor, has two targets. The first is to obtain consistency of the (uni)multi-variate normal distribution, and the second is to correct for bias at small sample sizes.

The parameters and are estimated from the elliptically symmetric unimodal distribution of the (uni)multi-variate data as follows

(10)(11)

where is the Mahalanobis distance, is a weight function with I as the indicator function,  ∗  denotes the complex conjugate, , is the α-quantile of the Chi distribution (), with , and h taken such as  [ ( n + p + 1 ) ∕ 2 ] ≤ h ≤ n. To comprehensively review the MCD analysis method, consult [3942].

The coefficient of variation (CV) ratio, based on the MCD, indicates the covariance dispersion of data from the mean value. It is given by

(12)

2.2.5 Tuning weight.

As stated above, the experimental approach taken in this work is to differentiate COVID-19, viral pneumonia, lung opacity, and normal cases in chest X-ray images by identifying the near dependencies between the image columns X. This identification is done by calculating the singular values and the associated large conditional indices following equations Eqs (1) and (2). Additionally, the variance-decomposition proportions (Σ)(Eq (3)) greater than a threshold of 0 . 5 are considered. These proportions are related to the groups of variables (i.e., image columns) involved in near dependencies. A single tuning weight ω is introduced to improve the classification for these situations. Note that is a vector of size M × 1 with M values greater than 0 . 5. To estimate ω, the non-parametric empirical distribution is estimated (Sec. 2.2.3) for each variance-decomposition proportion , and the vector is estimated following equations Eqs (8) and (9). The tuning weight ω is then estimated as the coefficient variation based on the minimum covariance determinant Eq (12), denoted , of the univariate vector h.

(13)

The resulting parameter feature vector is

(14)

where 1 ≤ k ≤ Ψ. Please note that Ψ are the first most representative components. Because the number of Ψ components is not known in advance, the scree plot using the profile likelihood function is estimated; see Sect 2.2.2. Note that the elements of the vector are in descending order, while the components of the vector are in ascending order.

2.2.6 Feature relevance.

The ReliefF algorithm is used to select relevant texture features for classification. This algorithm is based on a filter-method approach [43]. It works by calculating a weight for each feature . Given a dataset of n instances of p features, the algorithm iteratively selects one random instance . It locates each class’s instance closest to and considers their feature vectors.

Initially zero, the weight at iteration i is updated according to equations Eq (15) or Eq (16) depending on the class of :

(15)(16)

with

(17)(18)

where and are the prior probabilities of the classes to which , and belong, respectively; m is the number of iterations; is the difference of the value of the feature between observations and ; and denote the value of the feature j for the observations and respectively; is a distance function such as Euclidean; rank ( r , q )  is the position of instance q among the nearest neighbours of the observation r, sorted by distance; k is the number of nearest neighbours; ϵ is the distance scaling factor; note that for all nearest neighbours to have the same influence ϵ = . This feature weights procedure will allow us to assess the relative importance of the features used with our classification method. For a complete treatment of the ReliefF algorithm, read [44,45].

2.2.7 Multiclass classification method.

The final stage of our proposed method consists of using the feature vector developed above to classify X-ray images. The dataset used in this study is unbalanced. COVID-19 over normal cases has a ratio of 1:3, COVID-19 over viral pneumonia has a ratio of 3:1, and COVID-19 over lung opacity has a 1:2 ratio. The well-known ensemble learning method based on Bagged trees was considered for its good performance in various applications and for handling imbalanced data classes. The associated performance was evaluated according to the following metrics [4648]: True Positive Rate (or recall, or sensitivity), False Negative Rate, Positive Predictive Values, False Discovery Rate, Area Under Curve, Accuracy Rate, and Total Cost. The execution time is given for comparison purposes based on a standard PC use.

2.2.8 Partial dependence analysis.

To analyse the behaviour of the classification model and assess the effect of the two features on the predictions, we performed the partial dependence analysis. This technique is used in machine learning to interpret the relationship between a single feature (or a subset of features) and the model’s prediction. It helps to understand how the given feature (or subset of features) influences the predictions of the model while averaging out the effects of the other features. It consists of creating and interpreting partial dependence plots provided in Sect 3. The technique is utilised to understand feature importance and how they contribute to predictions. It is also essential to validate the behaviour of models in healthcare applications.

thumbnail
Fig 5. Fit of the non-parametric empirical distribution to the variance-covariance matrix data: Class 0 (normal, blue dots), Class 1 (COVID-19, red dots), Class 2 (viral pneumonia, purple dots), and Class 3 (lung opacity, yellow dots).

https://doi.org/10.1371/journal.pone.0320706.g005

Let f ( Θ )  denote a predictive model, with the feature vector . For binary classification, f ( Θ )  is the probability estimate P ( y = 1 | x ) . Let denote a small subset of Θ, with its complement , of size . The partial dependence between and the model’s output is defined as:

(19)

where is the expectation taken over the marginal distribution of in the dataset. In practice, given the discrete nature of the data, is computed as the average over all values of :

(20)

For K classes, the generalised partial dependence for class k is:

(21)

Eq 21 means that for each class k, the probability is marginalised over the complementary features to analyse how P ( y = k )  varies with . We refer the reader to [19] for a comprehensive treatment of partial dependence analysis.

In our study, and K = 4. We calculated and displayed the partial dependence plots for each class to assess the effect of singular values and conditional indices on the model’s decision. We performed this analysis with and without applying the tuning weight. Results and discussion are provided in Sect 3.

3 Results and discussion

This section reports the evaluation results of the proposed method using the previously introduced database. The dataset comprises chest X-ray images distributed as follows: 3 , 616 COVID-19, 1 , 345 viral pneumonia, 6 , 012 lung opacity, and 10 , 192 normal cases. Each chest X-ray image has a size of N × K (where N = K = 299). Images were converted to grayscale, retaining the luminance and eliminating the hue and saturation information.

We run two experimental scenarios. The first is without applying the tuning weights (scenario one), and the second is with the tuning weights (scenario two). For each scenario, the proposed features were used to train and test the classification model to discriminate between the four classes. For the first scenario, each image was reduced to two features . In the second scenario, the features were weighted using the tuning parameter ω, . The scree plot method (see Sect 2.2.2) yields a boundary number of Ψ = 5, related to the first more representative singular values, see Fig 4. This leads to 10 features for each image, 5 times the two parameters: . Remember that ω is given by the coefficient of variation from the minimum covariance determinant of the smoothing parameters h (see Sects 2.2.3, 2.2.4, and 2.2.5). For illustration, Fig 5 depicts the non-parametric distribution fitted to the variance-covariance matrix data. While the curves corresponding to the four classes appear very similar, they exhibit subtle variations that may not be visually perceptible. These variations are confirmed by the statistics of the smoothing parameter h (see Table 2). Although minimal, these differences are sufficient for the ensemble model to capture class-specific characteristics. The parameter ω amplifies these variations, enhancing the model’s ability to learn distinctive patterns for each class.

thumbnail
Table 2. Statistics of the smoothing parameter h across all images and classes.

https://doi.org/10.1371/journal.pone.0320706.t002

Before applying the classification, the ability of the features to discriminate the four classes was analysed. Fig 6 shows the nonlinear scatter plots of the feature vectors over 105 , 825 observations. Figs 6(a) and 6(d) show how the tuning weight parameter works. One can notice how the cases of normal (blue dots), COVID-19 (red dots), lung opacity (yellow dots), and viral pneumonia (purple dots) in the initial part, singular values () between 0 and 9 can be easily differentiated, as shown in Fig 6(b). However, the cases are very close in the final part, as shown in Fig 6(c). The feature vector (without the tuning weight) has the next bounds. The singular values (svd) are in the interval , while the conditional indices (idx) are in the interval . Large indices identify near dependencies among columns of X. Therefore, the size of the indices is a measure of how near dependencies are to detecting all X-ray images under study, especially COVID-19. The p-values were estimated by using the T-test method [49], with p-value= 0 . 0325 for ζ, and p-value= 0 . 0001 for η, indicating high significance. Fig 7 shows the rank importance of the predictors from the ReliefF algorithm. One can notice that the conditional indices are more representative concerning the singular values without the tuning weight parameter ω, as seen in Fig 6(b). All conditional indices can easily be differentiated for each case of study. However, singular values become more representative when applying the tuning weight parameter, as seen in Fig 6(d). At first, the classes are even more separated, but in the end, the separation is optimal for all cases, especially COVID-19. Note that both conditional indices and singular values were rescaled. For illustration, Table 3 shows the different thresholds for the two experimental scenarios (with and without the tuning weight parameter). The mean and standard deviation are high in the two scenarios for the COVID-19 class, which allows it to be used in a threshold detection scheme [50] to differentiate COVID-19 from the other classes under study.

thumbnail
Fig 6. Scatter plot of the feature vector using the complete dataset: Class 0 (normal, blue dots), Class 1 (COVID-19, red dots), Class 2 (viral pneumonia, purple dots), and Class 3 (lung opacity, yellow dots). (a) Without the tuning weight parameter ω. (d) With the tuning weight parameter ω. The plots have similar shapes but exhibit a drift that aids in differentiating classes.

https://doi.org/10.1371/journal.pone.0320706.g006

thumbnail
Fig 7. Relative feature importance computed by ReliefF using the complete dataset.

(a) Without the tuning weight parameter ω. (b) With the tuning weight parameter ω.

https://doi.org/10.1371/journal.pone.0320706.g007

For the classification stage, the dataset was randomly split into 90% (95 , 243 observations) for training and 10% (10 , 582 observations) for testing. The ensemble learning model, evaluated using 10-fold cross-validation, demonstrated promising results. Both experimental scenarios were assessed using standard performance metrics, including True Positive Rate (recall/sensitivity), False Negative Rate, Positive Predictive Value, False Discovery Rate, Area Under the Curve (AUC), Accuracy, and Total Cost. The confusion matrices, shown in the Supporting Information section in S1 Fig for the training and S2 Fig for the testing, indicate high performance across all classes with minimal false responses. S3 Fig presents the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC) for both training and testing stages without the weight parameter ω. All ROC curves exhibit high True Positive Rates and low False Positive Rates, with AUC values close to one for all classes. These findings suggest that the proposed model effectively distinguishes between the studied classes.

Table 4 summarises these results, highlighting that the False Negative Rate and False Discovery Rate remain low, while the tuning parameter substantially reduces the total cost.

thumbnail
Table 3. Thresholds for all data with and without the tuning weight parameter (ω). Feature values for SVD (ζ) and the conditional index (η) are different and can serve with a threshold detection scheme.

https://doi.org/10.1371/journal.pone.0320706.t003

thumbnail
Table 4. Performace metrics average: TNR- True Positive Rate (or recall, or sensitivity), FNR- False Negative Rate (FNR), PPV- Positive Predictive Values, FDR- False Discovery Rate, AUC- Area Under Curve, ACC- Accuracy rate, and Total Cost.

https://doi.org/10.1371/journal.pone.0320706.t004

The rescaled feature vector (i.e., multiplied by the tuning weight) improved the distinction between the four classes, especially COVID-19, as shown in Fig 6(d). The tuning weight ω was found equal to  { 0 . 0142 ; 0 . 0173 ; 0 . 0143 ; 0 . 0161 } , respectively for the normal, COVID-19, viral pneumonia, and lung opacity classes. The singular values and the conditional indices were found in the ranges  [ 0 . 003 , 0 . 267 ]  and  [ 0 . 061 , 1 . 208 ] , respectively. The rescaling mapped the features onto a different scale, where the classes are better discriminated. This rescaling significantly improved the performance of the ensemble learning classifier. S4 Fig and S5 Fig Figs, in the “Supporting Information" section, show the confusion matrices for training and testing, respectively. These results are aligned with the observed class separation shown in Fig 6(d).

Figs 8 and 9 show the partial dependence plots of the two features, without and with the tuning weight. The curves show the effect of the features on predicting each class separately. Figs 8(a) and 8(b) show the compared effect of the tuning weight with the conditional indices on the model’s behaviour. We can observe that the model has similar behaviour for conditional indices between 0 and 20 and their corresponding weighted values. In this range, the model favours the class “lung opacity" (till index 8), then has more confidence in predicting “viral pneumonia" (between 8 and 18), and then switches to predicting “COVID-19" with high confidence (index around 18). However, after applying the tuning weight, the effect is obvious: the model predicts “COVID-19" with high confidence and high probability, starting from the weighted value of 0 . 02. This observation aligns with our results, showing that the classification model with weighted features performs significantly better.

Similarly, when comparing Figs 9(a) and 9(b), the model behaves similarly with and without the tuning weight for singular values between 0 and 4. The model first favours “COVID-19" in this range with significant confidence. Then, it becomes more confident in predicting the “normal chest" class with significant probability. However, with the weighted singular values, the model predicts the “normal chest" class with high confidence and probability. Using the jointly weighted conditional indices and the weighted singular values, it appears clear that the model finds a combined pattern that allows it to differentiate the four classes and, more importantly, “COVID-19" from “normal chest".

Considering all the plots from Figs 8 and 9, it is reasonable to hypothesise that using the four features together will be useful for learning complex image patterns. However, we did not test this hypothesis in this study, as the results were good with only the weighted features.

thumbnail
Fig 8. Partial dependency plots of conditional indices during testing, with and without the tuning weight ω: Class 0 (normal, blue dots), Class 1 (COVID-19, red dots), Class 2 (viral pneumonia, purple dots), and Class 3 (lung opacity, yellow dots).

The x-axis values are the conditional indices in (a) and the weighted conditional indices in (b). Values of the y-axis are probabilities of predicting a class. For example, in (a), the probability of predicting COVID-19 is 0 . 9 when the conditional index is around 18. Curves show the variation of the probabilities of predicting each class depending on the values of the feature. Figure (b) shows that for values of the weighted conditional indices starting from 0 . 2, the model is highly confident in predicting COVID-19, with a high probability (more than 0 . 8).

https://doi.org/10.1371/journal.pone.0320706.g008

thumbnail
Fig 9. Partial dependency plots of singular values during testing, with and without the tuning weight ω: Class 0 (normal, blue dots), class 1 (COVID-19, red dots), class 2 (viral pneumonia, purple dots), and class 3 (lung opacity, yellow dots).

The x-axis values are singular values in (a) and weighted singular values in (b). Values of the y-axis are probabilities of predicting a class. For example, in (a), the probability of predicting COVID-19 is 0 . 9 when the singular value is around 0 . 40. Curves show the variation of the probabilities of predicting each class depending on the values of the feature. Comparing plots in (a) and (b), we notice that the behaviour of the model is similar in the range  [ 0 , 5 ] . However, Figure (b) shows that for the weighted singular values between  [ 0 . 005 , 0 . 010 ] , the model is highly confident in predicting normal chest, with high probability (close to 0 . 8).

https://doi.org/10.1371/journal.pone.0320706.g009

In texture analysis, the classification approaches are based on spatial localisation used in methods of edge detection and discrimination function features. Spatial localisation’s main challenge lies in the fact that it is difficult to distinguish the boundaries of the texture and the micro-edge found in the same texture. At the same time, the discrimination functions depend on the discriminative capacity of its texture characteristics. This way, the X-ray image class types can be defined based on anatomical or disease levels and include several cell and tissue types. This work specifically focuses on the characteristics given by singular values and conditional indices, which allows us to suggest that the features under study can be used for texture discrimination between different classes. Table 5 shows some examples of classifiers and feature extraction for X-ray images. Remarkably, our method scores are higher than CNN-based deep learning methods, which are currently at the forefront of X-ray image classification. They generally outperform conventional feature-based methods but with higher computational costs. The reader is referred to [5153] for a recent review about deep learning models for chest disease detection using X-ray images.

thumbnail
Table 5. State-of-the-art methods for X-ray image classification. Summarised in terms of the classifier, preprocessing, and features extraction used and their performance using the different datasets. CLAHE: Contrast limited adaptive histogram equalization. DT: Decision Tree, HOG: Histogram of Oriented Gradients, WMF: Weighted Median Filtering, LSTM: Long short-term memory. PWLGBP: Weighted Local Gabor Binary Pattern. ENNSA: Ensemble Neural Net Sentinel Algorithm. IGLCM: Insistent Grey Level Co-occurrence Matrix. DF-GAN: Deep Fusion Generative Adversarial Networks. The performance metrics are the True Positive Rate, recall or Sensitivity (TPR), the True Negative Rate, Negative Recall, or Specificity (TNR), and the Accuracy Rate (ACC).

https://doi.org/10.1371/journal.pone.0320706.t005

4 Conclusions

This work proposed an original method to classify chest X-ray images corresponding to different diseases, such as COVID-19, viral pneumonia, lung opacity, and normal. The proposed method is based on two features estimated using an SVD decomposition. These are the singular values and the condition indices. Traditionally, these parameters are used for multi-collinearity diagnosis in case of regression. This work uses them as features to characterise the image texture. In addition, we introduced a tuning weight parameter to consider the variability of the attenuation of X-ray tissues. This weight is estimated using the coefficient of variation of the minimum covariance determinant from the bandwidth of the non-parametric distribution of variance-decomposition proportions. The resulting features were used with an ensemble learning method to classify normal, COVID-19, viral pneumonia, and lung opacity X-ray images. Performance was assessed using True Positive Rate, False Negative Rate, Positive Predictive Values, False Discovery Rate, Area Under Curve, Accuracy Rate, and Total Cost. On average, the method achieved an accuracy of 88% without applying the tuning weight and 99% with it. This result suggests that the proposed method can be used as an efficient texture discriminator to characterise different tissues in X-ray images, especially in respiratory syndromes.

In addition to its excellent performance, the proposed method has a descent computational cost with imbalanced data, compared to current deep learning methods. Computer-based quantitative image texture analysis has an important potential to improve image interpretation by yielding reproducible results.

The proposed method’s main limitations are that it does not explicitly consider physiological and non-physiological artefacts, clutter, and ambiguities in the images, and it is difficult to deal with highly imbalanced data. In the medical context, there is a lack of clear definitions of biomedical texture information for validation and translation to routine clinical applications. There is also a lack of an appropriate framework for multiscale and multispectral analysis in 2D and 3D images. Computer-based quantitative imaging features can be challenging to interpret, as they can appear abstract to a physician, and their meaning in the clinical context may not be directly apparent.

Future work will focus on a more extensive evaluation of the proposed approach, the study of robust feature extraction methods using an approximation of the image by a sparse low-rank matrix [7477]. Additional work can target image segmentation coupled with soft tissue decomposition to identify tissue-at-risk regions affected by COVID-19 and other respiratory pathologies, where all lung images exhibit patches. Also, it allows studying the taste receptors and their relationship with the walls of blood vessels in the lungs and the characterization of acute respiratory distress syndrome (ARDS) [78].

Supporting information

S1 Fig. Confusion Matrices without the tuning weight parameter during training for Class 0 (normal), Class 1 (COVID-19), Class 2 (viral pneumonia), and Class 3 (lung opacity).

https://doi.org/10.1371/journal.pone.0320706.s001

(PDF)

S2 Fig. Confusion Matrices without the tuning weight during testing for Class 0 (normal), Class 1 (COVID-19), Class 2 (viral pneumonia), and Class 3 (lung opacity).

https://doi.org/10.1371/journal.pone.0320706.s002

(PDF)

S3 Fig. Receiver Operating Characteristic (ROC) and Area Under the Curve (AUC) of the features without the tuning weight ω.

https://doi.org/10.1371/journal.pone.0320706.s003

(PDF)

S4 Fig. Confusion Matrices with the tuning weight parameter during training for Class 0 (normal), Class 1 (COVID-19), Class 2 (viral pneumonia), and Class 3 (lung opacity).

https://doi.org/10.1371/journal.pone.0320706.s004

(PDF)

S5 Fig. Confusion Matrices with the tuning weight parameter during testing for Class 0 (normal), Class 1 (COVID-19), Class 2 (viral pneumonia), and Class 3 (lung opacity).

https://doi.org/10.1371/journal.pone.0320706.s005

(PDF)

To reproduce these results, please use:https://github.com/tonioquinterorincon/Computer-based-quantitative-image-texture-analysis-using-multi-collinearity-diagnosis-in-chest-X-ray

Acknowledgments

The authors are grateful to the Faculty of Engineering and Agricultural Sciences at the Pontifical Catholic University of Argentina (UCA) for providing financial support for the publication of this work.

References

  1. 1. Depeursinge A, Al-Kadi OS, Mitchell JR. Biomedical texture analysis. Fundamentals, tools and challenges. Academic Press; 2017.
  2. 2. Kumar S, Bhandari AK. Automatic tissue attenuation-based contrast enhancement of low-dynamic X-ray images. IEEE Trans Radiat Plasma Med Sci 2022;6(5):574–82.
  3. 3. Li L, Lv M, Ma H, Jia Z, Yang X, Yang W. X-ray image enhancement based on adaptive gradient domain guided image filtering. Appl Sci 2022;12(20):10453.
  4. 4. Liu M, Mei S, Liu P, Gasimov Y, Cattani C. A new X-ray medical-image-enhancement method based on multiscale Shannon-cosine wavelet. Entropy (Basel) 2022;24(12):1754. pmid:36554159
  5. 5. Huang C-C, Nguyen M-H. X-ray enhancement based on component attenuation, contrast adjustment, and image fusion. IEEE Trans Image Process 2019;28(1):127–41. pmid:30130186
  6. 6. Sun H, Ren G, Teng X, Song L, Li K, Yang J, et al. Artificial intelligence-assisted multistrategy image enhancement of chest X-rays for COVID-19 classification. Quant Imaging Med Surg 2023;13(1):394–416. pmid:36620146
  7. 7. Wang L, Lin ZQ, Wong A. COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci Rep 2020;10(1):19549. pmid:33177550
  8. 8. De Moura J, Garcia LR, Vidal PFL, Cruz M, Lopez LA, Lopez EC, et al. Deep convolutional approaches for the analysis of COVID-19 using chest X-ray images from portable devices. IEEE Access. 2020;8:195594–607. pmid:34786295
  9. 9. Abbas A, Abdelsamea MM, Gaber MM. Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network. Appl Intell (Dordr) 2021;51(2):854–64. pmid:34764548
  10. 10. Toğaçar M, Ergen B, Cömert Z, Özyurt F. A deep feature learning model for pneumonia detection applying a combination of mRMR feature selection and machine learning models. IRBM 2020;41(4):212–22.
  11. 11. Wang L, Wang Q, Wang X, Ma Y, Zhang L, Liu M. Triplet-constrained deep hashing for chest X-ray image retrieval in COVID-19 assessment. Neural Netw. 2024;173:106182. pmid:38387203
  12. 12. Chen T, Philippi I, Phan QB, Nguyen L, Bui NT, daCunha C, et al. A vision transformer machine learning model for COVID-19 diagnosis using chest X-ray images. Healthcare Anal. 2024;5:100332.
  13. 13. Fan Y, Gong H. An improved COVID-19 classification model on chest radiography by dual-ended multiple attention learning. IEEE J Biomed Health Inform 2023;28(1):145–56. pmid:37831572
  14. 14. Naz S, Phan K, Chen Y-PP. Centralized and federated learning for COVID-19 detection with chest X-ray images: Implementations and analysis. IEEE Trans Emerg Top Comput Intell 2024;8(4):2987–3000.
  15. 15. He B, Chen Y, Zhu D, Xu Z. Domain adaptation via Wasserstein distance and discrepancy metric for chest X-ray image classification. Sci Rep 2024;14(1):2690. pmid:38302556
  16. 16. Van Der Maaten L, Postma EO, Herik VD. Dimensionality reduction: A comparative review. Technical report TiCC TR 2009-005. Tilburg Centre for Creative Computing, Tilburg University; 2009.
  17. 17. Jolliffe IT. Principal component analysis. Springer; 2002.
  18. 18. Yanai H, Takeuchi K, Takane Y. Projection matrices, generalized inverse matrices, and singular value decomposition. Springer; 2011.
  19. 19. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: Data mining, inference, and prediction. Springer Nature; 2017.
  20. 20. Luo JH, Chen CC. Singular value decomposition for texture analysis. In: Tescher AG, editor. Applications of digital image processing XVII. vol. 2298 of Society of Photo-Optical Instrumentation Engineers (SPIE) conference series; 1994. p. 407–18.
  21. 21. Haines GV, Fiori RAD. Modeling by singular value decomposition and the elimination of statistically insignificant coefficients. Comput Geosci. 2013;58:19–28.
  22. 22. Hung CC, Song E, Lan Y. Image texture analysis foundations, models and algorithms. Springer; 2019.
  23. 23. Saxena SK. Coronavirus disease 2019 (COVID-19): Epidemiology, pathogenesis, diagnosis, and therapeutics. Springer; 2020.
  24. 24. Capaccione KM, Yang H, West E, Patel H, Ma H, Patel S, et al. Pathophysiology and imaging findings of COVID-19 infection: An organ-system based review. Acad Radiol 2021;28(5):595–607. pmid:33583712
  25. 25. Liu J, Tang X, Lei C. Atlas of chest imaging in COVID-19 patients. Springer; 2021.
  26. 26. Revel M-P, Parkar AP, Prosch H, Silva M, Sverzellati N, Gleeson F, et al. COVID-19 patients and the radiology department – advice from the European Society of Radiology (ESR) and the European Society of Thoracic Imaging (ESTI). Eur Radiol 2020;30(9):4903–9. pmid:32314058
  27. 27. WHO. Communicating radiation risk in paediatric imaging: Information to support health care discussions about benefits and risk. World Health Organization; 2016.
  28. 28. COVID-19 Radiography Database. https://www.kaggle.com/tawsifurrahman/covid19-radiography-database?select=COVID-19_Radiography_Dataset.
  29. 29. Chowdhury MEH, Rahman T, Khandakar A, Mazhar R, Kadir MA, Mahbub ZB, et al. Can AI help in screening viral and COVID-19 pneumonia? IEEE Access. 2020;8:132665–76
  30. 30. Rahman T, Khandakar A, Qiblawey Y, Tahir A, Kiranyaz S, Abul Kashem SB, et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput Biol Med. 2021;132:104319. pmid:33799220
  31. 31. Golub GH, Loan CFV. Matrix computations. Johns Hopkins University Press; 2013.
  32. 32. Baglama J, Reichel L. Augmented implicitly restarted Lanczos bidiagonalization methods. SIAM J Sci Comput 2005;27(1):19–42.
  33. 33. Cattell RB. The Scree test for the number of factors. Multivariate Behav Res 1966;1(2):245–76. pmid:26828106
  34. 34. Zhu M, Ghodsi A. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Comput Stat Data Anal 2006;51(2):918–30.
  35. 35. Lamberti WF. An overview of explainable and interpretable AI. AI Assur. 2023:55–123. https://doi.org/10.1016/b978-0-32-391919-7.00015-9
  36. 36. Jolliffe IT. Principal component analysis. Springer; 2002.
  37. 37. Keho Y. The basics of linear principal components analysis. In: Sanguansat P, editor. Principal component analysis. Rijeka: IntechOpen; 2012.
  38. 38. Bowman AW, Azzalini A. Applied smoothing techniques for data analysis: The Kernel approach with S-plus illustrations. Oxford Statistical Sciences Series; 1997.
  39. 39. Rousseeuw PJ, Driessen KV. A fast algorithm for the minimum covariance determinant estimator. Technometrics 1999;41(3):212–23.
  40. 40. Rousseeuw PJ, Leroy AM. Robust regression and outlier detection. Wiley; 1987.
  41. 41. Hubert M, Debruyne M. Minimum covariance determinant. WIREs Comput Stats 2009;2(1):36–43.
  42. 42. Maronna RA, Martin RD, Yohai VJ, Salibián-Barrera M. Robust statistics theory and methods with R. Wiley; 2019.
  43. 43. Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH. Relief-based feature selection: Introduction and review. J Biomed Inform. 2018;85:189–203. pmid:30031057
  44. 44. Robnik-Šikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn. 2003;53(1–2):23–69.
  45. 45. Liu H, Motoda H. Computational methods of feature selection. Chapman and Hall; 2007.
  46. 46. He YMH. Imbalanced learning: Foundations, algorithms, and applications. Wiley-IEEE Press; 2013.
  47. 47. Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F. Learning from imbalanced data sets. Springer; 2018.
  48. 48. Flach P. Machine learning. The art and science of algorithms that make sense of data. Cambridge University Press; 2001.
  49. 49. Quintero-Rincón A, Pereyra M, D’Giano C, Risk M, Batatia H. Fast statistical model-based classification of epileptic EEG signals. Biocyber Biomed Eng 2018;38(4):877–89.
  50. 50. Quintero-Rincón A, D’Giano C, Batatia H. Artefacts detection in EEG signals. Advances in Signal Processing: Rev Book Ser. 2021;2:413-41. ISBN: 978-84-09-28830-4. https://doi.org/10.1234/example.doi
  51. 51. Shoeibi A, Khodatars M, Jafari M, Ghassemi N, Sadeghi D, Moridian P, et al. Automated detection and forecasting of COVID-19 using deep learning techniques: A review. Neurocomputing. 2024;577:127317.
  52. 52. Ait Nasser A, Akhloufi MA. A review of recent advances in deep learning models for chest disease detection using radiography. Diagnostics (Basel) 2023;13(1):159. pmid:36611451
  53. 53. Meedeniya D, Kumarasinghe H, Kolonne S, Fernando C, Díez ID la T, Marques G. Chest X-ray analysis empowered with deep learning: A systematic review. Appl Soft Comput. 2022;126:109319. pmid:36034154
  54. 54. Khan E, Rehman MZU, Ahmed F, Alfouzan FA, Alzahrani NM, Ahmad J. Chest X-ray classification for the detection of COVID-19 using deep learning techniques. Sensors (Basel) 2022;22(3):1211. pmid:35161958
  55. 55. Ullah N, Javed A. Deep features comparative analysis for COVID-19 detection from the chest radiograph images. In: 2021 international conference on frontiers of information technology (FIT); 2021. p. 258–63. https://doi.org/10.1109/fit53504.2021.00055
  56. 56. Saad A, S. Kamil I, Alsayat A, Elaraby A. Classification COVID-19 based on enhancement X-ray images and low complexity model. Comput Mater Continua 2022;72(1):561–76.
  57. 57. Jusman Y, Tyassari W, Nindita W, Harahap AJH, Ismail AM. Developed histogram of oriented gradients-based feature extraction for Covid-19 X-ray image classification. In: 2022 2nd international seminar on machine learning, In: 2022 2nd international seminar on machine learning; 2022. p. 24–24.
  58. 58. COVID-Net. https://alexswong.github.io/COVID-Net/.
  59. 59. DLAI3 Hackathon Phase3 COVID-19 CXR Challenge. Multi-class CXR dataset of COVID-19, thorax disease, and no finding. https://www.kaggle.com/datasets/jonathanchan/dlai3-hackathon-phase3-covid19-cxr-challenge.
  60. 60. Lin Z, He Z, Xie S, Wang X, Tan J, Lu J, et al. AANet: Adaptive attention network for COVID-19 detection from chest X-ray images. IEEE Trans Neural Netw Learn Syst 2021;32(11):4781–92. pmid:34613921
  61. 61. Jawahar M, L JA, Ravi V, Prassanna J, Jasmine SG, Manikandan R, et al. CovMnet-deep learning model for classifying coronavirus (COVID-19). Health Technol (Berl) 2022;12(5):1009–24. pmid:35966170
  62. 62. COVID chest X-ray dataset. https://github.com/ieee8023/covid-chestxray-dataset.
  63. 63. Ayalew AM, Salau AO, Abeje BT, Enyew B. Detection and classification of COVID-19 disease from X-ray images using convolutional neural networks and histogram of oriented gradients. Biomed Signal Process Control. 2022;74:103530. pmid:35096125
  64. 64. Hatamleh WA, Tarazi H, Subbalakshmi C, Tiwari B. Analysis of chest X-ray images for the recognition of COVID-19 symptoms using CNN. Wireless Commun Mobile Comput. 2022;22:1–12.
  65. 65. Hossain MS, Shorfuzzaman M. Noninvasive COVID-19 screening using deep-learning-based multilevel fusion model with an attention mechanism. IEEE Open J Instrum Meas. 2023;2:1–12.
  66. 66. Kaya M, Eris M. D3SENet: A hybrid deep feature extraction network for Covid-19 classification using chest X-ray images. Biomed Signal Process Control. 2023;82:104559. pmid:36618337
  67. 67. Dharmesh Ishwerlal R, Agarwal R, Sujatha KS. Lung disease classification using chest X ray image: An optimal ensemble of classification with hybrid training. Biomedical Signal Processing and Control. 2024;91:105941.
  68. 68. Junia RC, Selvan K. Deep learning-based automatic segmentation of COVID-19 in chest X-ray images using ensemble neural net sentinel algorithm. Measur Sensors. 2024;33:101117.
  69. 69. RSNA pneumonia detection challenge. https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data.
  70. 70. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR); 2017. p. 3462–71.
  71. 71. NIH chest X-rays. https://www.kaggle.com/datasets/nih-chest-xrays/data.
  72. 72. Chest X-rays (Indiana University). https://www.kaggle.com/datasets/raddar/chest-xrays-indiana-university
  73. 73. Bahani M, El Ouaazizi A, Avram R, Maalmi K. Enhancing chest X-ray diagnosis with text-to-image generation: A data augmentation case study. Displays. 2024;83:102735.
  74. 74. Rissanen J. Modeling by shortest data description. Automatica 1978;14(5):465–71.
  75. 75. Huffel SV, Vandewalle J. The total least squares problem. Computational aspects and analysis. Society for Industrial and Applied Mathematics; 1991.
  76. 76. Yu W, Gu Y, Li Y. Efficient randomized algorithms for the fixed-precision low-rank matrix approximation. SIAM J Matrix Anal Appl 2018;39(3):1339–59.
  77. 77. Wang J, Hou J, Eldar YC. Tensor robust principal component analysis from multilevel quantized observations. IEEE Trans Inform Theory 2023;69(1):383–406.
  78. 78. Kertesz Z, Harrington EO, Braza J, Guarino BD, Chichger H. Agonists for bitter taste receptors T2R10 and T2R38 attenuate LPS-induced permeability of the pulmonary endothelium in vitro. Front Physiol. 2022;13:794370. pmid:35399266