Figures
Abstract
A general-purpose method of emphasizing abnormal lesions in chest radiographs, named EGGPALE (Extrapolative, Generative and General-Purpose Abnormal Lesion Emphasizer), is presented. The proposed EGGPALE method is composed of a flow-based generative model and L-infinity-distance-based extrapolation in a latent space. The flow-based model is trained using only normal chest radiographs, and an invertible mapping function from the image space to the latent space is determined. In the latent space, a given unseen image is extrapolated so that the image point moves away from the normal chest X-ray hyperplane. Finally, the moved point is mapped back to the image space and the corresponding emphasized image is created. The proposed method was evaluated by an image interpretation experiment with nine radiologists and 1,000 chest radiographs, of which positive suspected lung cancer cases and negative cases were validated by computed tomography examinations. The sensitivity of EGGPALE-processed images showed +0.0559 average improvement compared with that of the original images, with -0.0192 deterioration of average specificity. The area under the receiver operating characteristic curve of the ensemble of nine radiologists showed a statistically significant improvement. From these results, the feasibility of EGGPALE for enhancing abnormal lesions was validated. Our code is available at https://github.com/utrad-ical/Eggpale.
Citation: Hanaoka S, Nomura Y, Hayashi N, Sato I, Miki S, Yoshikawa T, et al. (2024) Deep generative abnormal lesion emphasization validated by nine radiologists and 1000 chest X-rays with lung nodules. PLoS ONE 19(12): e0315646. https://doi.org/10.1371/journal.pone.0315646
Editor: Asadullah Shaikh, Najran University College of Computer Science and Information Systems, SAUDI ARABIA
Received: December 17, 2023; Accepted: November 25, 2024; Published: December 12, 2024
Copyright: © 2024 Hanaoka et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data is not publicly available because data contain potentially identifying or sensitive patient information. The data requests should be sent to the Data Access Committee of the University of Tokyo Hospital (contact via hanaoka-tky@g.ecc.u-tokyo.ac.jp) for researchers who meet the criteria for access to confidential data.
Funding: The Department of Computational Radiology and Preventive Medicine, the University of Tokyo Hospital, is sponsored by HIMEDIC Inc. and Siemens Healthcare K.K. This work was supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 18K12095 and Japan Science and Technology Agency (JST), CREST Grant Number JPMJCR21M2. This work was also supported by the Joint Usage/Research Center for Interdisciplinary Large-Scale Information Infrastructures and High- Performance Computing Infrastructure Projects in Japan (Project IDs: jh170036-DAH, jh180073-DAH, jh190047-DAH and jh200042-DAH). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Chest X-ray examination is a cheap and easily performable medical imaging test and is widely used for the screening of various diseases, including heart failure, lung cancer, pneumonia, and tuberculosis. However, the sensitivity of chest X-rays is less than that of computed tomography (CT) [1, 2], and the X-ray sensitivity among radiologists can vary widely [3–6]. In particular, it is difficult to detect lesions that are small, faint, or superimposed on a bone or the heart, and experience is needed to distinguish real lesions from fake lesions, e.g., superimposed vessels and/or bones.
Many image processing methods that can emphasize lesions in chest X-rays have been presented. One type of image processing method is intensity-based (including contrast stretching [7], histogram equalization (HE) [8], and contrast limited adaptive HE (CLAHE) [9]). Another type of image processing method (feature-based enhancement) emphasizes the lesions themselves, such as by using a finite impulse response filter [10], Laplacian of Gaussian (LoG) filter [11], Hessian-LoG filter [12], or wavelet transform [13]. The third type of method includes rib bone suppression techniques [14–16]. However, to the best of our knowledge, no image processing method that can emphasize abnormal lesions but does not alter the normal structure (bone, vessels, etc.) has been reported.
In recent years, image processing using deep learning has become increasingly popular [17]. Thanks to such new sophisticated techniques, the performance of computer-aided detection (CAD) is improving [3, 18–21]. Nevertheless, typical supervised CAD training requires a huge training dataset, in which each sample must be labeled as normal or abnormal. Manual input of each lesion position/shape is also usually needed. Moreover, a multiple-disease dataset is needed in order to build a database that enables CAD to handle multiple types of lesions. These problems have been partly solved by preparing a huge database that includes several frequent pathologies, such as the Chest-Xray14 dataset [22]. So far, however, few single methods that can handle any type of disease/lesion have been reported.
Theoretically, a model trained only with a huge set of normal cases can also detect abnormal lesions by, for example, an out-of-distribution (OoD) detection method [23]. Such a strategy is also often called one-class classifier or unsupervised anomaly detection problem. These new methodologies may enable us to build a multipurpose CAD application. However, a relatively small number of studies have been conducted so far.
Recently, flow-based generative models, which are one of unsupervised deep learning methods, have emerged. The flow-based generative model is a family of generative models, which includes the variational autoencoder (VAE) [24] and generative adversarial network (GAN) [25, 26]. The advantages of a flow-based model over the VAE and GAN include the invertibility of the mapping function and explicit computability of the probability of each model instance [27]. Although many generative-model-based anomaly detection methods have been reported (denoising autoencoder (AE) [28], adversarial AE [29], AnoGAN [30], Efficient GAN [31], α-GAN [32], fast AnoGAN [33]), to the best of our knowledge, only one method that uses flow-based generative models as an anomaly detector in medical images has been reported [34].
In this study, we propose a novel abnormality-enhancing method based on flow-based generative models for chest radiographs. The method uses Glow [27], one of the state-of-the-art flow-based generative models. We choose Glow because of the existence and uniqueness of inversion function which maps a point from the latent space to the image space, unlike VAE and GAN. Owing to existence of unique inversion function, the proposed algorithm can generate a unique anomaly-enhanced image for any input image. Firstly, a large dataset of normal chest radiographs is used to train a Glow model. Then, using the mapping function provided by the model, an inputted unseen chest radiograph is mapped to a certain point in the latent space. The enhancement is performed by moving this point away from the “normal chest X-ray hyperplane,” which is a hyperplane that includes all normal data points in the training. In other words, the image point in the latent space is extrapolated so that its distance from the normal hyperplane increases. We introduce a dedicated L∞-norm-based extrapolation method so that the image change can become more lesion-specific. Finally, by using the invertibility of Glow, the enhanced point is mapped back so that the corresponding enhanced image is generated. We named the proposed method Extrapolative, Generative and General-Purpose Abnormal Lesion Emphasizer, or EGGPALE.
EGGPALE only requires normal chest radiographs in its training phase. It only amplifies abnormal lesions and leaves other normal structures unchanged. This is in strong contrast to previous anomaly enhancement methods, in which the alteration of image contrast/edges, vanishing of the rib cage/spinal column, etc., are inevitable. A potential advantage of this “leave normal as is” property is that a physician only has to check the amplified image and does not have to check the original image, so the workload of physicians will not be doubled. We hypothesized that, using only our amplified radiographs (without checking the original images), radiologists can detect tumorous lung lesions more accurately. To prove this hypothesis, image interpretation experiments involving nine radiologists were performed. Radiographs in which tumorous/nodule-like lesions had been validated by CT images were used in the experiments. We intentionally included chest radiographs with tiny nodules that were not obvious in the radiograph but were revealed by CT to evaluate the increase in human sensitivity when using EGGPALE. Although theoretically our method can enhance various types of anomalies (such as heart enlargement, pneumothorax, and pleural effusion), in this study, we focus on the enhancement of nodule-like lesions.
The contributions of this study are as follows:
- A novel abnormality-enhancing method, EGGPALE, is proposed. Using the invertibility of flow-based models and extrapolation in the latent space, EGGPALE can successfully enhance various pathological lesions.
- Image interpretation experiments involving nine radiologists and 1,000 chest X-ray images demonstrated that EGGPALE can increase the sensitivity of radiologists in detecting nodule-like lesions without significantly increasing the image interpretation time.
To the best of our knowledge, this is the first study in which the sensitivity of physicians to chest X-ray lung nodule-like lesions was improved by abnormality enhancement without changing normal structures or image contrast. Moreover, although a number of previous works have synthesized chest X-rays with GANs [35–37], this is the first study in which chest X-ray images were synthesized by a flow-based algorithm.
Methods
A. Background—Glow
Glow [27] is a flow-based generative model that can estimate the probability distribution function (PDF) of given training datasets (i.e., images). Simultaneously, it generates a mapping function that maps an image instance to a point in the learned latent vector field in which the PDF becomes a simple explicit form (e.g., a multidimensional Gaussian distribution). Let the image resolution be w × h. Suppose that x represents an image vector (whose length equals wh). Let the true PDF of the given image domain (e.g., normal chest X-ray) x be p(x). Consider the estimation of p(x) by a certain parametric function pθ(x), where θ is a parameter set to be optimized. Given the training datasets (i.e., X-ray images) x(i), i = 1,2, …, N, this problem is formulated as a log-likelihood maximization problem as follows:
(1)
Again, pθ(x) is a parametric function that resembles p(x). The first main idea of a flow-based algorithm [38, 39] is to introduce an invertible function fθ(x) and latent expression z, for instance, as follows:
(2)
(3)
where
denotes a wh-dimensional multivariate standard Gaussian distribution. Note that the number of dimensions of the latent expression z is the same as that of the image x.
The second key idea is to divide function fθ into a chain (composition) of invertible functions such that fθ = fK ∘ fk−1 ⋯ ∘ f2 ∘ f1. Each function fi, i = 1, 2, … K is a simple and explicitly invertible parametric function whose parameters are also determined by θ. The determinant of the Jacobian matrix of fi should also be calculated explicitly. Then the relationship between x and z becomes
(4)
where hi = fi(hi−1). This sequence of invertible functions is called a “normalizing flow”. Then, pθ(x) can be calculated as follows using the chain rule of composite functions:
(5)
where h0 ≜ x and hK ≜ z. The matrix dhi/dhi−1 is the Jacobian matrix of the function fi. Note that
can be calculated easily because of Eq (2). Using Eq (5), the parameter θ of the estimated PDF pθ(x) can be estimated by searching for the value of θ that minimizes Eq (1).
For conciseness, the details of the Glow framework used are omitted in this paper. A diagram of the entire Glow network used in this study is shown in Fig 1. As described in the original paper on Glow [27], we used actnorm, an invertible 1 × 1 convolution, and affine coupling layers as the components of each step. The number of steps per level was 32, and the total number of levels was seven. Between two adjacent levels, squeeze and split operations were inserted. The image resolution of w = h = 512 was used. Therefore, both the input x and the latent expression z were 262,144-dimensional vectors. Most hyperparameters are the same as [27]: the filter (channel) numbers were not changed (please see Fig 1), the input window size is doubled (512×512), the number of epochs was fixed to 22 (we chose a number as large as possible with our computational resource). The other parameters are:
- Number of levels in the Glow model: 7
- Number of affine coupling layers per level: 32
- Number of convolution layers in each affine coupling layer: 3
- Kernel size in each convolutional layer: 3×3
B. Modification of Glow
We slightly modified the original Glow algorithm. First, the weight matrix W of each invertible 1 × 1 convolution (see Fig 1) can collapse in the training phase so that W becomes rank deficient and there is no inverse matrix W−1. This phenomenon sometimes occurred and prevented the invertibility of fθ. To avoid this, at the end of each epoch, W was modified as follows using singular value decomposition:
(6)
where the operation crop means the element-wise cropping of the range of values so that all diagonal elements of the diagonal matrix Σnew are in the range of [10−3, 103]. Note that this operation ensures the existence of (Wnew)−1 = V(Σnew)−1UT.
The second modification is that the distribution of p(z) is fixed to the standard Gaussian distribution instead of a parametrized (non-standard) Gaussian distribution. This is because we need the PDF of z to be isotropic so that the Euclidean and L∞ distances in the latent space have a sense.
C. Normal chest X-ray hyperplane
The input image x is first mapped to a point z in the latent space using z = fθ(x), and then the anomaly is amplified in the latent space as follows. The simplest way is to move z away from the origin O, which corresponds to the trained average image, in the latent space. However, this method often results in significant deformation of the original image, such as distortions in the thoracic cage. To address this issue, our study adopts a more sophisticated approach utilizing the normal chest X-ray hyperplane S (refer to Fig 2).
Let x(i), i = 1, 2, …, N be the training datasets (normal cases only). Suppose that N < wh. Let the mapped points of x(i) be z(i) = fθ (x(i)). Then, all these N points in the latent space, z(i), i = 1, 2, …, N, as well as the origin O, span an N-dimensional hyperplane. We call this hyperplane the normal chest X-ray hyperplane S.
Suppose that the input image (for abnormality enhancement) is xinput. Let the corresponding point in the latent space be zinput = fθ(xinput). Then, consider the perpendicular line from zinput down to hyperplane S. Let the intersection of this perpendicular line and S be z0 (see Fig 2). Note that the corresponding image can be regarded as a “virtually normalized” image of the given (possibly abnormal) image xinput. How to calculate S and z0 is described in S1 File.
Then, for example, a simple extrapolation (with Euclidean distance) can be performed as
(7)
The parameter γ determines the strength of abnormality enhancement. When 1 < γ, the corresponding abnormality-enhanced image can be created.
D. Extrapolation with L∞ distance
Although the Euclidean distance-based extrapolation works well, it tends to excessively amplify the shift, rotation, local deformation, and so forth, as well as abnormal lesions. We hypothesize that this phenomenon is owing to the fact that, in Eq (7), all elements of z are altered simultaneously. Instead, we wish to amplify only certain elements of z that are responsible for the abnormal lesions, leaving the other elements as they are. To solve this problem, we introduce an L∞ distance-based extrapolation.
Fig 3 shows an outline of L∞ distance-based extrapolation. This extrapolation consists of three steps: (1) finding z0, (2) interpolation using the L∞ distance, and (3) final extrapolation with the Euclidean distance. Operation (1) is the same as that in the previous section and the projected point z0 is determined on S. Then, in operation (2), interpolation between zinput and z0 is performed. Let the interpolated point between zinput and z0 be . For the sake of explanation, the interpolation operation can be rewritten in a Euclidean distance-based manner:
(8)
for hypersphere C2 = {z│‖z − z0‖2 ≤ γ ⋅ ‖zinput − z0‖2} (Fig 3A). This definition is equivalent to Eq (7) when 0 ≤ γ ≤ 1. Then, we replace the L2 norm with the L∞ and L1 norms, as follows:
(9)
for hypercube C∝ = {z│‖z − z0‖∝ ≤ γ ⋅ ‖zinput − z0‖∝} (Fig 3B and 3C). Intuitively, this means that the point
moves along a polygonal line, instead of a straight line, between z0 and zinput. When γ increases from 0 to 1, the point
first moves from z0 by an L∞ or chessboard distance (i.e., diagonally), and then moves by an L1 or Manhattan distance (i.e., horizontally or vertically) toward zinput. The actual calculation can simply be performed by the following element-wise range-cropping function [40]:
(10)
where {⋅}j denotes the jth element and and Γ = γ ⋅ ‖zinput − z0‖∞. As a result, many elements of
are the same as those of zinput. Then, (3) the final extrapolation is performed between zinput and
using a Euclidean distance,
(11)
with the extrapolation parameter 1 < β. Because many elements of
are the same as those of zinput, this extrapolation works for only a small number of elements, so only abnormal lesions are enhanced. In this study, the parameters γ = 0.2 and β = 1.2 were used, which were determined experimentally. (Grid search with subjective image evaluation by a radiologist was performed to determine optimal parameter values. Too large value of β can lead to large local or global deformation of the resulting image.) Finally, the enhanced image
is generated using Glow.
Fig 4 is a pseudocode of the proposed EGGPALE enhancement.
E. Experimental settings
This retrospective study was approved by our institutional review board (the University of Tokyo Hospital, IRB protocol number 2561-(19), date of approval 2020/9/16). Informed consent was waived in this retrospective study (which was approved by the IRB). The data were accessed for research on March 16th, 2021. The authors did not have access to information that could identify individual participants during or after data collection. We used the ChestX-ray14 open dataset [22] for the training in the proposed method. Among the 112,120 chest radiographs in ChestX-ray14, we extracted 39,302 radiographs that are labeled as normal and whose radiation direction is posterioanterior (PA). Then we randomly selected approximately 70% of them, resulting as 27,504 normal chest radiographs (N = 27,504). Some other radiographs with abnormal labels were also used in our subjective and qualitative experiment.
For our quantitative blind image-reading experiment, we utilized our domestic radiograph dataset, which was validated by CT. Positive and negative cases were collected from the University of Tokyo Hospital. For the positive cases, the inclusion criteria are (1) chest CT examinations performed in the University of Tokyo Hospital from January 2017 to June 2018 and (2) their radiological diagnosis reports include “lung cancer” or “suspected lung cancer.” A total of 604 cases met these criteria, although several cases were excluded because no corresponding chest radiograph was available (one month before–one month after the CT examination). Cases with diseases (pneumonia, pleural effusion, etc.) unrelated to lung field/hilar/mediastinal masses/nodules were also excluded. After exclusion, 509 cases met the criteria. We used 100 of the 509 cases in the development and the subjective qualitative study, and the other 409 cases in the quantitative image-reading experiment. Note that we did not set any tumor-size-based criterion. Fig 5 shows a histogram of the sizes of the tumorous lesions included in our positive cases (measured in CT images).
Negative cases were collected from our health check program in the University of Tokyo Hospital. The inclusion criteria are (1) chest CT examinations performed for the health check program, (2) negative findings in chest CT reports, and (3) the corresponding chest radiographs were available (one month before–one month after the CT examination). Finally, 591 cases were selected randomly from the cases meeting the criteria.
Before both the training and the experiments, all images were preprocessed as follows. Firstly, the pixel intensities were cropped so that the range of intensity was [0,255]. In range cropping, we used the DICOM tags “window center / window width” (tag IDs: (0028,1050) and (0028,1051), respectively) and the indicated range was then rescaled to [0,255]. Then, the images were resized to a width of 1,024 pixels per image. If necessary, the top/bottom borders were also truncated so that the image height was also 1,024 pixels. Before processing by Glow, images were down-sampled to 512×512 pixels. On the other hand, after enhanced image generation, images were up-sampled to 1024×1024 pixels (using bicubic interpolation). Therefore, all image interpretation experiments were performed by radiologists using images with 1024×1024 pixels.
Glow was trained using 27,504 chest radiographs from the ChestX-ray14 dataset as described above. For the training, a Reedbush-L supercomputer system was used. We used one node with two Intel Xeon E5-2695v4 processors, a memory of 256 GB, and four GPUs (Tesla P100, NVIDIA Corporation, Santa Clara, CA). Data parallelism using Horovod and TensorFlow was utilized. The minibatch size was four. No early stopping was used, and the training duration was 168 h (the upper limit of the supercomputer system usage). The loss curve in the training phase is shown in the S2 File.
Abnormality enhancement was performed using the trained Glow system. The normal chest X-ray hyperplane was also calculated using all 27,504 training cases. The QR factorization of Ztraining (whose size was 262,144 × 27,504) was also performed by Reedbush-L using the ScaLAPACK library [41]. In the abnormality enhancement, we did not have to use data parallelism, and thus only one GPU was used for enhancing images. Processing of one radiograph took approximately 5 s.
After processing to enhance the abnormality 1,000 radiographs (409 positives and 591 negatives, as described above), a quantitative blind image-reading experiment was performed. Nine radiologists with 3, 3, 3, 4, 5, 7, 9, 18 and 31 years of experience were assigned to this study. The 1,000 radiographs were divided into two groups, A and B. Each radiologist was first asked to interpret a shuffled mixture of original radiographs of A and enhanced radiographs of B. Then, two weeks later, each radiologist was asked to interpret enhanced radiographs of A and original radiographs of B. The interpretation was carried out using our domestically developed browser-based software. Each radiologist inputted the existence or non-existence of nodulous lesion(s) for each lung (including the adjacent mediastinal area) for each case. Thus, a total of 1,000 cases × 2 enhanced/unenhanced × 2 lung fields = 4,000 inputs was evaluated by each radiologist. The reaction time of each interpretation was also collected. After all images were interpreted by the nine radiologists, each input was judged as correct or incorrect on the basis of ground truth information determined using CT images. The accuracy, sensitivity, and specificity were calculated for each radiologist with and without the proposed abnormality enhancement. Furthermore, the receiver operating characteristic (ROC) of the ensemble of all nine radiologists was also analyzed. The statistical analysis was performed using R 4.0.2. (The R Project for Statistical Computing, Vienna, Austria).
As a baseline method, we also performed an experiment with an open-source rib bone suppression method [42]. Apart from the different enhancing/suppressing methods, the experimental setting was the same as that of the main experiment described above.
Finally, we performed an experiment in which the amount of nodulous region enhancement was quantitatively evaluated. First, we manually inputted the region of interest (ROI) of each nodulous lesion in each of 100 cases (the dataset for the subjective qualitative study). This dataset included 133 nodules. We also semiautomatically extracted the lung field of each of the 100 images. Then, the contrast-to-noise ratio (CNR) was calculated for each nodule as follows:
(12)
where μnodule, μlungfield, and σlungfield are the mean intensity of the nodule ROI, the mean intensity of the lung field, and the standard deviation of the lung field, respectively. In the same way, the CNR of each EGGPALE-enhanced image, or CNR+, was also calculated. Then, the relative improvement of CNR by EGGPALE was estimated as follows:
(13)
The distribution of ΔCNR was evaluated by plotting a histogram.
Results
Our proposed EGGPALE method successfully enhanced all inputted images. Examples of normal cases and cases with nodular lesions, pneumonia/consolidation, pleural effusion, heart enlargement, etc. are shown in Figs 6 and 7. As shown, local abnormal structures were successfully enhanced with apparently no change to normal structures.
The effect of enhancement is shown in the subtracted images (right column). The answer column indicates the true nodule that should be detected by radiologists. The true lesions are indicated by arrows.
To confirm that EGGPALE does not largely change normal chest X-ray images, the Turing test was performed using 100 normal images. Two radiologists independently evaluated 100 pairs of images, each containing one with EGGPALE processing and one without. The radiologists were tasked with determining whether each image was processed with EGGPALE or not. In total, 200 images were shuffled and presented to each radiologist for judgment. The resulting accuracies were 0.79 and 0.63 for the two radiologists, indicating that 21–37% of images were misclassified (either original images misclassified as EGGPALE-processed or vice versa). Based on these findings, we conclude that our proposed EGGPALE algorithm minimally impacts image quality, as misjudgments occurred within an acceptable range.
The results of the quantitative blind image-reading experiment are shown in Table 1 and Figs 7 and 8. Fig 7 demonstrates the results of nodular lesion emphasis. In Table 1 and Fig 8, the changes in the sensitivity and specificity of each radiologist by EGGPALE are shown. The average improvement of sensitivity was 0.0559, whereas the average decrease in specificity was 0.0192. Paired Student’s t-tests showed significant differences for both (p = 1.79 × 10−5 and p = 0.0017, respectively). The sensitivity improved and the specificity decreased for all nine radiologists. Therefore, we concluded that EGGPALE successfully increased the sensitivity with little deterioration of specificity. On the other hand, the sensitivity deteriorated for all nine radiologists when using the bone suppression method [42]. Therefore, the superiority of the proposed model compared with the existing model was validated.
Open and closed circles represent results without and with EGGPALE, respectively. YOE = years of experience.
Each row represents each radiologist. Each column represents whether the images shown were emphasized by EGGPALE or not.
Fig 9 shows the average reaction times of the radiologists. EGGPALE slightly increased the interpretation time for most radiologists, but the differences were small (the average increase was 444 ms per case).
YOE = years of experience.
Fig 10 shows the ROC curves of the ensemble of all nine radiologists. The areas under the ROC curves (AUCs) were 0.827 and 0.846 without and with EGGPALE, respectively. According to DeLong’s test, the AUCs had a significant difference (p = 0.04311).
Fig 11 shows the histogram of the change in CNR, ΔCNR, of the nodulous regions. As shown, CNR was improved for most (113 out of 133) of the nodulous regions. Therefore, we concluded that our method can enhance most nodulous regions without significantly changing the contrast of the background (lung field) pixels.
Discussion
The effective use of CAD software in daily routine work is a difficult problem to address. Sometimes CAD is used with a “concurrent reading” style, in which physicians read the original image and the CAD output simultaneously. Usually, however, CAD is used with a “second reading” style, in which physicians read the original image first and then check the CAD output [43]. Inevitably, both reading styles increase the reading time. This is one of the barriers to the widespread use of CAD software among radiologists, who often read 103–104 images in a single day. In this study, we used the “single reading” style, in which only the CAD output (i.e., enhanced image) was checked. Therefore, the increase in reading time was minimal (444 ms). Although we have not proven that all pathologies can be diagnosed by reading EGGPALE-enhanced chest X-ray images, our experiment proved that, with the enhanced radiographs, radiologists can detect more lung cancers/cancerous lesions with a minimal increase in reading time. We hope that EGGPALE can help and boost the performance of busy radiologists in this sense.
In the image-reading experiment, a statistically significant improvement of sensitivity was observed. It is probable that abnormality enhancement can prevent radiologists from overlooking some tumorous lesions. It is also possible that very faint lesions that are below the recognition threshold of most radiologists became recognizable after EGGPALE processing. Note that distinguishing these two phenomena by experiments is very difficult. However, through our careful subjective observation (Fig 7) and quantitative experiment (Fig 11), it is suggested that lesions below the human recognition threshold can be made visible by EGGPALE. Therefore, it is possible that EGGPALE can not only reduce incidences of overlooked abnormalities but also improve the sensitivity of the X-ray examination itself. Note that the sensitivities in this study were much lower than those of previous studies because we included lesions that were not apparently visible in chest radiographs and only visible in CT.
Along with improved sensitivity, the reduced specificity of all radiologists was observed. The reduction tended to be larger for the radiologists with higher sensitivity. The sensitivity–specificity plot showed that the point for each radiologist on the plot tended to move parallel to the distribution of the nine radiologists’ sensitivities and specificities. In addition, the change in accuracy was approximately zero. Thus, it can be considered that, although EGGPALE changed the threshold of each radiologist on his/her ROC curve, it had little power to push up the ROC curve itself. However, EGGPALE improved the mean sensitivity by 0.0559 and reduced the specificity by 0.0192. Note that the +5.5% of sensitivity improvement was comparable to that (+5.2%) of the commercial chest X-ray nodule CAD (Samsung ALND) reported in [44]. In detail, the average sensitivities of the radiologists without and with their CAD were 65.1% and 70.3%, where the average numbers of false positives per image were 0.2 and 0.18, respectively. Note that their experiment with CAD was performed with a second-reading style (on the other hand, ours is with a single-reading style and is more challenging). In screening tests, the improved sensitivity of chest radiographs can lead to earlier detection of lung cancer patients and thus improve the total outcome of health check programs or medical systems. The deterioration of specificity by 0.0192 would lead to an increase in further CT examinations, but we believe that the use of EGGPALE would be beneficial in health check programs owing to its probable improvement of lung cancer detectability.
The AUC of the ROC of the ensemble of nine radiologists when using EGGPALE was significantly higher than that when they used the original images. In detail, the two curves were mostly identical in the high-specificity, low-sensitivity area, whereas the curve for EGGPALE improved in the low-specificity, high-sensitivity area. This suggests that the proposed method can improve the diagnosis of radiologists, especially when there are two competing diagnoses. In other words, EGGPALE may help radiologists diagnose difficult cases correctly.
For comparison with other nodule enhancement methods, we searched for papers in which nodules are enhanced by (1) contrast enhancement, (2) image filtering, or (3) bone suppression. Although many studies have been reported, as far as we searched, there is no research on (1) or (2) in which the improvement of performance of radiologists was proven in an actual film-reading environment with real X-ray images and nodules. Moreover, we confirmed that no commercially available nodule-enhancing method uses (1) or (2) in the US [45] or EU [46]. Only several works with (3), including [14, 47], proved their effectiveness in performance improvement of radiologists. Therefore, we compared our result with those in these two papers. In [47], operating at a specificity of 90%, sensitivity increased with bone suppressed image from 66% to 71%. In [14], sensitivity was increased from 49.5% to 66.3%, but specificity was decreased from 96.1% to 91.8%. We believe that our result is comparable to theirs. Please note that the experiments in both papers were performed with second reading. Our experiments were performed with single reading only and was therefore more challenging.
The bone suppression system we used for comparison [42] showed poor performance when they were read by radiologists without the original images. In both [14, 47], bone suppression techniques were reported to improve radiologists’ sensitivities when a second reading is performed. However, in our study, a single reading was conducted, that is, the radiologists were blind to the original image. Therefore, now we believe that bone suppression images provide little advantage to radiologists when original images are not available. In other words, a bone suppression image is useful for radiologists if and only if the corresponding original image is also available. Indeed, to the best of our knowledge, all commercially available bone suppression systems are not recommended for use for single reading.
Fig 6 shows that relatively large deformations occur when the existing abnormality itself is large. Generally, the proposed method is designed to change only a relatively small area around an abnormal object if the object is small. In contrast, when the abnormal object is large, a wide or global deformation or a density change is inevitable. Such a deformation or density change severely affects the subtraction image. Note that, in this study, subtraction images are shown for explanatory purpose only and all the film-reading experiments were performed using the emphasized (non-subtracted) images.
This work has some limitations. Our main quantitative experiment was performed only with suspicious lung cancer cases. Chest radiography can be used for a wide range of other important diseases such as tuberculosis, pneumonia, and pneumothorax. Our future work will be to evaluate the diagnostic benefit of EGGPALE for such diseases. Another limitation is that the ROC curve of each radiologist was not available because only a binary input was made for each lung; it was practically too difficult to input multiple confidence levels for all 1,000 cases in our environment. Another future task will be to evaluate datasets with a multiple confidence level system. Finally, our experiment was performed using closed domestic datasets, not open datasets. However, there is no available open and large chest X-ray dataset in which the existence or non-existence of tumor-like lesion was validated by CT examinations.
In summary, the strengths of this study Strengths are: (1) The proposed method exhibits theoretical versatility, allowing for potential application across different scenarios. (2) Experimental validation was conducted using a substantial number of X-ray images and involved the assessment of nine radiologists, enhancing the robustness of the findings. On the other hand, the limitations are: (1) Quantitative evaluation across various pathologies was lacking, potentially limiting the generalizability of the results. However, challenges with regard to computational cost may arise when we train an EGGPALE model on new datasets. (2) The absence of individual ROC curves for each radiologist reduces the granularity of the analysis and may obscure variations in performance. (3) The dataset used was closed and relatively small in size, which could constrain the breadth of insights gained and the applicability of the findings to broader contexts.
Conclusion
A novel general-purpose abnormality enhancement method, EGGPALE, was presented. It successfully improved the sensitivity of radiologists to cancerous lesions in chest radiographs. Based on the experimental results, the sensitivity demonstrated an average improvement of +0.0559, albeit with an average specificity deterioration of -0.0192. Given the statistically significant improvement in the area under the AUC curve of the ensemble of nine radiologists, we assert the feasibility of our proposed method for lesion enhancement. However, establishing the versatility of the proposed method for various types of lesions remains a focus of our future work. Our future works will also include the application of EGGPALE to other modalities such as mammography and head CT.
Supporting information
S1 File. Appendix.
The numerical calculation of S.
https://doi.org/10.1371/journal.pone.0315646.s001
(DOCX)
S2 File. Supplemental material.
Semiquantitative image-reading experiment.
https://doi.org/10.1371/journal.pone.0315646.s002
(DOCX)
References
- 1. Toyoda Y, Nakayama T, Kusunoki Y, Iso H, Suzuki T. Sensitivity and specificity of lung cancer screening using chest low-dose computed tomography. Br J Cancer. 2008;98: 1602–1607. pmid:18475292
- 2. Piccazzo R, Paparo F, Garlaschi G. Diagnostic accuracy of chest radiography for the diagnosis of tuberculosis (TB) and its role in the detection of latent TB infection: a systematic review. J Rheumatol. 2014. Available: https://www.jrheum.org/content/91/32.abstract pmid:24788998
- 3. Singh R, Kalra MK, Nitiwarangkul C, Patti JA, Homayounieh F, Padole A, et al. Deep learning in chest radiography: Detection of findings and presence of change. PLoS One. 2018;13: e0204155. pmid:30286097
- 4. Forrest JV, Friedman PJ. Radiologic errors in patients with lung cancer. West J Med. 1981;134: 485–490. Available: https://www.ncbi.nlm.nih.gov/pubmed/7257363 pmid:7257363
- 5. Quekel LG, Kessels AG, Goei R, van Engelshoven JM. Miss rate of lung cancer on the chest radiograph in clinical practice. Chest. 1999;115: 720–724. pmid:10084482
- 6. Tudor GR, Finlay D, Taub N. An assessment of inter-observer agreement and accuracy when reporting plain radiographs. Clin Radiol. 1997;52: 235–238. pmid:9091261
- 7.
Gonzalez RC, Woods RE, Others. Digital image processing. Prentice hall Upper Saddle River, NJ; 2002. https://mirrors.nju.edu.cn/pub/CTAN/biblio/bibtex/contrib/persian-bib/Persian-bib-userguide.pdf
- 8.
Hummel R. Image enhancement by histogram transformation. comp. graph. 1977.
- 9. Senthilkumar R. Triad histogram to enhance chest X-ray image. nternational j adv res comput commun eng. 2014; 8577–8580.
- 10. Kwan BY, Kwan HK. Improved lung nodule visualization on chest radiographs using digital filtering and contrast enhancement. Proc World Acad of Sci Eng Technol. 2011;110: 590–593. Available: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.853.7080&rep=rep1&type=pdf
- 11.
Bao C, Sheng C. A parameterized logarithmic image processing method based on Laplacian of Gaussian filtering for lung nodules enhancement in chest radiographs. 2013 2nd International Symposium on Instrumentation and Measurement, Sensor Network and Automation (IMSNA). ieeexplore.ieee.org; 2013. pp. 649–652.
- 12. Shi Z, Zhao M, Wang Y, He L, Suzzuki K, Jin C, et al. Hessian-LoG: a novel dot enhancement filter. ICIC Exp Lett. 2012;6: 1987–1992. Available: http://mypages.iit.edu/~ksuzuki/pdfs/coauthor/ShiZEtAl_Hessian-LoG_ICICExpLet2012.pdf
- 13. Alavijeh FS, Mahdavi-Nasab H. Multi-scale Morphological Image Enhancement of Chest Radiographs by a Hybrid Scheme. J Med Signals Sens. 2015;5: 59–68. Available: https://www.ncbi.nlm.nih.gov/pubmed/25709942 pmid:25709942
- 14. Freedman MT, Lo S-CB, Seibel JC, Bromley CM. Lung nodules: improved detection with software that suppresses the rib and clavicle on chest radiographs. Radiology. 2011;260: 265–273. pmid:21493789
- 15. Oda S, Awai K, Suzuki K, Yanaga Y, Funama Y, MacMahon H, et al. Performance of radiologists in detection of small pulmonary nodules on chest radiographs: effect of rib suppression with a massive-training artificial neural network. AJR Am J Roentgenol. 2009;193: W397–402. pmid:19843717
- 16. Suzuki K, Abe H, MacMahon H, Doi K. Image-processing technique for suppressing ribs in chest radiographs by means of massive training artificial neural network (MTANN). IEEE Trans Med Imaging. 2006;25: 406–416. pmid:16608057
- 17. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42: 60–88. pmid:28778026
- 18. Pan I, Cadrin-Chênevert A, Cheng PM. Tackling the Radiological Society of North America Pneumonia Detection Challenge. AJR Am J Roentgenol. 2019;213: 568–574. pmid:31120793
- 19. Pande T, Cohen C, Pai M, Ahmad Khan F. Computer-aided detection of pulmonary tuberculosis on digital chest radiographs: a systematic review. Int J Tuberc Lung Dis. 2016;20: 1226–1230. pmid:27510250
- 20. Shih G, Wu CC, Halabi SS, Kohli MD, Prevedello LM, Cook TS, et al. Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia. Radiology: Artificial Intelligence. 2019;1: e180041. pmid:33937785
- 21. Qin C, Yao D, Shi Y, Song Z. Computer-aided detection in chest radiography based on artificial intelligence: a survey. Biomed Eng Online. 2018;17: 113. pmid:30134902
- 22.
Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-Ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2017.
- 23.
Nalisnick E, Matsukawa A, Teh YW, Gorur D, Lakshminarayanan B. Do Deep Generative Models Know What They Don’t Know? arXiv [stat.ML]. 2018. http://arxiv.org/abs/1810.09136
- 24.
Kingma DP, Welling M. Auto-Encoding Variational Bayes. arXiv [stat.ML]. 2013. http://arxiv.org/abs/1312.6114v10
- 25.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative Adversarial Nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, editors. Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2014. https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
- 26. Yi X, Walia E, Babyn P. Generative adversarial network in medical imaging: A review. Med Image Anal. 2019;58: 101552. pmid:31521965
- 27.
Kingma DP, Dhariwal P. Glow: Generative Flow with Invertible 1x1 Convolutions. arXiv [stat.ML]. 2018. http://arxiv.org/abs/1807.03039
- 28. Sato D, Hanaoka S, Nomura Y, Takenaga T, Miki S, Yoshikawa T, et al. A primitive study on unsupervised anomaly detection with an autoencoder in emergency head CT volumes. Medical Imaging 2018: Computer-Aided Diagnosis. International Society for Optics and Photonics; 2018. p. 105751P.
- 29.
Chen X, Konukoglu E. Unsupervised Detection of Lesions in Brain MRI using constrained adversarial auto-encoders. arXiv [cs.CV]. 2018. http://arxiv.org/abs/1806.04972
- 30. Schlegl T, Seeböck P, Waldstein SM, Schmidt-Erfurth U, Langs G. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. Information Processing in Medical Imaging. Springer International Publishing; 2017. pp. 146–157.
- 31.
Zenati H, Romain M, Foo C-S, Lecouat B, Chandrasekhar V. Adversarially Learned Anomaly Detection. 2018 IEEE International Conference on Data Mining (ICDM). 2018. pp. 727–736.
- 32. Nakao T, Hanaoka S, Nomura Y, Murata M, Takenaga T, Miki S, et al. Unsupervised Deep Anomaly Detection in Chest Radiographs. J Digit Imaging. 2021;34: 418–427. pmid:33555397
- 33. Schlegl T, Seeböck P, Waldstein SM, Langs G, Schmidt-Erfurth U. f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Med Image Anal. 2019;54: 30–44. pmid:30831356
- 34. Shibata H, Hanaoka S, Nomura Y, Nakao T, Sato I, Sato D, et al. A versatile anomaly detection method for medical images with a flow-based generative model in semi-supervision setting. Int J CARS (In Press). 2021.
- 35. Tang Y, Tang Y, Zhu Y, Xiao J, Summers RM. A disentangled generative model for disease decomposition in chest X-rays via normal image synthesis. Med Image Anal. 2021;67: 101839. pmid:33080508
- 36. Salehinejad H, Colak E, Dowdell T, Barfett J, Valaee S. Synthesizing Chest X-Ray Pathology for Training Deep Convolutional Neural Networks. IEEE Trans Med Imaging. 2019;38: 1197–1206. pmid:30442603
- 37. Segal B, Rubin DM, Rubin G, Pantanowitz A. Evaluating the Clinical Realism of Synthetic Chest X-Rays Generated Using Progressively Growing GANs. SN Computer Science. 2021;2: 321. pmid:34104898
- 38.
Dinh L, Krueger D, Bengio Y. NICE: Non-linear Independent Components Estimation. arXiv [cs.LG]. 2014. http://arxiv.org/abs/1410.8516
- 39.
Dinh L, Sohl-Dickstein J, Bengio S. Density estimation using Real NVP. arXiv [cs.LG]. 2016. http://arxiv.org/abs/1605.08803
- 40.
Brauer Christoph, Lorenz Dirk A., Tillmann Andreas M. A Primal-Dual Homotopy Algorithm for l_1-Minimization with l_inf-Constraints. In: Optimization Online [Internet]. [cited 29 Apr 2024]. https://optimization-online.org/2016/10/5700/
- 41. Choi J, Demmel J, Dhillon I, Dongarra J, Ostrouchov S, Petitet A, et al. ScaLAPACK: a portable linear algebra library for distributed memory computers—design issues and performance. Comput Phys Commun. 1996;97: 1–15.
- 42.
Chương HM. ML-BoneSuppression. Github https://github.com/hmchuong/ML-BoneSuppression; https://github.com/hmchuong/ML-BoneSuppression
- 43. Beyer F, Zierott L, Fallenberg EM, Juergens KU, Stoeckel J, Heindel W, et al. Comparison of sensitivity and reading time for the use of computer-aided detection (CAD) of pulmonary nodules at MDCT as concurrent or second reader. European Radiology. 2007. pp. 2941–2947. pmid:17929026
- 44. Sim Y, Chung MJ, Kotter E, Yune S, Kim M, Do S, et al. Deep Convolutional Neural Network–based Software Improves Radiologist Detection of Malignant Lung Nodules on Chest Radiographs. Radiology. 2020;294: 199–209. pmid:31714194
- 45.
aicentral. In: aicentral [Internet]. [cited 10 Jun 2022]. https://aicentral.acrdsi.org/
- 46.
grand-challenge.org. In: grand-challenge.org [Internet]. [cited 10 Jun 2022]. https://grand-challenge.org/aiforradiology/
- 47. Schalekamp S, van Ginneken B, Meiss L, Peters-Bax L, Quekel LGBA, Snoeren MM, et al. Bone suppressed images improve radiologists’ detection performance for pulmonary nodules in chest radiographs. Eur J Radiol. 2013;82: 2399–2405. pmid:24113431