Investigation of the effectiveness of no-reference metric in image evaluation in nuclear medicine

Shigeaki Higashiyama; Yutaka Katayama; Atsushi Yoshida; Nahoko Inoue; Takashi Yamanaga; Takao Ichida; Yukio Miki; Joji Kawabe

doi:10.1371/journal.pone.0310305

Abstract

Background

In nuclear medicine, normalized mean square error (NMSE) is widely used for image quality evaluation and machine adjustment. However, evaluating clinical images in nuclear medicine using NMSE necessitates acquiring a reference image, which is time consuming and impractical. Therefore, it is necessary to explore no-reference metrics, such as perception-based image quality evaluator (PIQE) and natural image quality evaluator (NIQE), as alternatives for evaluating the quality of clinical images used in nuclear medicine.

Purpose

To examine whether no-reference metrics can be applied to image quality evaluations for clinical images in nuclear medicine.

Methods

Images of the Hoffman Brain Phantom containing 18F–fluoro-2-deoxy-D-glucose (FDG) were obtained using Biograph Vision (Siemens Co., Ltd). From the collected images, 14 images with varying pixel counts and acquisition times were created. Sixteen images were visually evaluated by five image experts and ranked accordingly. Image quality was assessed using NMSE, PIQE, and NIQE, and rankings were calculated based on these scores.

Results

The Spearman’s significance test revealed a strong correlation between image quality evaluations using PIQE and visual evaluations by specialists (p<0.0001). PIQE demonstrated comparable performance to image experts in evaluating image quality, suggesting its potential for clinical image quality assessment in nuclear medicine.

Conclusions

PIQE offers a viable method for evaluating image quality in nuclear medicine, presenting a promising alternative to traditional visual inspection methods.

Citation: Higashiyama S, Katayama Y, Yoshida A, Inoue N, Yamanaga T, Ichida T, et al. (2024) Investigation of the effectiveness of no-reference metric in image evaluation in nuclear medicine. PLoS ONE 19(11): e0310305. https://doi.org/10.1371/journal.pone.0310305

Editor: Sadiq H. Abdulhussain, University of Baghdad, IRAQ

Received: September 7, 2023; Accepted: August 29, 2024; Published: November 21, 2024

Copyright: © 2024 Higashiyama et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The advent of artificial intelligence (AI)-based image processing approaches, such as generative adversarial network (GAN)-based models, has sparked significant interest in image quality assessment [1, 2]. However, traditional full-reference metrics such as peak signal-to-noise ratio (PSNR) or structural similarity (SSIM) may not effectively evaluate images generated using GANs [3, 4]. In contrast, no-reference metrics offer a promising solution for evaluating image quality when a reference image is unavailable.

Although full-reference metrics are commonly used in medical image evaluation, particularly in nuclear medicine, their reliance on reference images limits their applicability in clinical settings [5]. Normalized mean square error (NMSE), a prevalent full-reference metric, requires a long-term captured reference image corresponding to the target image, making it impractical for clinical evaluations [6–8]. Additionally, the lack of common training and standard image data further complicates image evaluation in nuclear medicine [9, 10]. For positron emission tomography (PET) images, efforts such as quantitative imaging biomarkers and harmonization have been made to evaluate pixel values obtained from images captured using different devices as comparable indicators [11, 12]. However, these are primarily intended to use pixel values as quantitative biomarkers and do not achieve image standardization [13, 14]. To address these challenges, this study investigated the efficacy of no-reference metrics, specifically PIQE and NIQE, in evaluating image quality for clinical images in nuclear medicine [15–17]. By comparing the results of no-reference metric evaluations with visual evaluations by specialists, we demonstrated the potential of these metrics in clinical practice.

Contributions and findings

We proposed using no-reference metrics, namely PIQE and NIQE, for the evaluation of image quality in clinical nuclear medicine.
We demonstrated a strong correlation between the results of PIQE and visual evaluation by specialists, indicating the potential of PIQE in clinical image quality assessment.
We confirmed the feasibility of utilizing no-reference metrics as alternative methods for image evaluation in nuclear medicine.

Materials and methods

Image acquisition method and analysis

The images were obtained using a Hoffman 3D brain phantom (Data Spectrum Co., Ltd) containing 26 MBq of ¹⁸F–fluoro-2-deoxy-D-glucose (FDG) with a 3D model and 1800 s of imaging time. The Images were collected according to the protocol for brain PET imaging distributed by the Japanese Society of Nuclear Medicine and the PET Nuclear Medicine Committee [18]. Biograph Vision 450(Siemens Co., Ltd.), used for clinical examination at our hospital, was used for imaging and data collection.

For this study, one axial image slice was selected from the acquired brain phantom images, depicting the frontal/temporal lobe and bilateral ventricles of the bilateral cerebral hemisphere and basal ganglia. Such images are commonly used in the study of brain PET images using phantoms [19, 20].

For evaluation, we prepared images with seven different acquisition times: 120, 180, 300, 360, 450, 600, and 900 s. The collection matrix for a total of eight seed collection times was 440-pixels. To assess images with different pixel counts, 880-pixel images (used for clinical examinations at our facility) corresponding to each of the eight acquisition times were generated. Images with varying acquisition times and pixel counts were created and extracted using Biograph Vision 450.

The imaging and image reconstruction conditions were as follows:

Pixel size: 0.825 × 0.825 mm
FOV: 363 mm
Slice thickness: 3 mm
Reconstruction conditions: Ordinary Poisson ordered-subsets expectation maximization with point-spread function and time-of-flight modeling of 214 ps
Random correction: Delayed coincidence measurement
Single scatter simulation
Subset: 5
Iteration: 8
Filter: all-pass
For 440-matrix size: 0.825 × 0.825 × 2 mm³
For 880-matrix size: 0.4125 × 0.4125 × 2 mm³
Computed tomography attenuation correction
80 mAs
120 kV
Slice thickness: 3 mm
Pitch: 0.55 mm
MATLAB (MathWorks Co., Ltd) was used to calculate the noise metric
JMP (SAS Japan Co., Ltd) was used for statistical analysis

Evaluator and image

Three qualified diagnostic radiologists and nuclear medicine specialists, along with four nuclear medicine technologists, were selected to visually evaluate the images. Among the four technologists, two had over 15 years of clinical experience in nuclear medicine, while the other two were inexperienced.

Fig 1 depicts an image captured at 1800 s; a total of eight images were captured with different acquisition times—120, 180, 300, 360, 450, 600, and 900 s. The pixel count of the image in Fig 1 is 440 s. Fig 2 displays eight images, each containing 880 matrix size, corresponding to each acquisition time shown in Fig 1.

Download:

Fig 1. A Hoffman 3D brain phantom with 26 MBq of ¹⁸F-fluoro-2-deoxy-D-glucose (FDG) with a 440-pixel matrix were obtained over 1800 s.

One slice of the axial image that depicts the frontal and temporal lobes, bilateral lateral ventricles, and basal ganglia was selected from the acquired brain phantom images. Images with different collection times (120, 180, 300, 360, 450, 600, and 900 s) were prepared with 326x188 pixels and RGB with 239K. A total of eight image types with different collection times and color scale are shown.

https://doi.org/10.1371/journal.pone.0310305.g001

Download:

Fig 2. 880-pixel images corresponding to each of the eight imaging times and color scale are shown.

Images with different collection times were prepared with 326x188 pixels and RGB with 239K.

https://doi.org/10.1371/journal.pone.0310305.g002

The following two items were defined as the visual evaluation criteria:

(1) A clear delineation of the basal ganglia limbus and its clear separation from the cerebral white and gray matter.

(2) Uniform accumulation of FDG in the basal ganglia and cerebral white matter.

Previous studies addressed image quality by quantifying "the contrast between gray-matter structures and a white matter structure" and determining "the sharpness of the gray/white-matter" [19, 20].

The aforementioned criteria were set to evaluate the sliced image used in this study. The evaluation method necessitated numerical ranking. Therefore, we employed a paired comparison method to rank the evaluations by the evaluator, referencing previous research [21].

Visual evaluation method

The paired comparison method was used for visual evaluation: two images were displayed on the left and right sides of a monitor. Among the total of 16 images, two were selected to ensure that the same image was not displayed. By pairing different images on the left and right sides, a total of 240 image types were prepared and displayed randomly. The evaluator was unaware of which two images would be presented. In Fig 3, a 440-pixel image acquired at 180 s displayed on the left, and a 880-pixel image acquired at 900 s is displayed on the right. Numbers corresponding to all the 240 images are shown in Fig 2, which serves as a score entry sheet. Fig 3 corresponds to square 29 in Fig 4.

Download:

Fig 3. An image presented to evaluators.

A 440-pixel at 180 s image and a 880-pixel at 900 s image are displayed on the left and right sides, respectively. An image presented to evaluators. On the left is a 440-pixel at 180 s image, and on the right is a 880-pixel at 900 s image. This image corresponds to square number 29 shown in Fig 2.

https://doi.org/10.1371/journal.pone.0310305.g003

Download:

Fig 4. Evaluation sheet for the images (out of 240 images) presented to the evaluator.

When the evaluator records the score, the cell in Fig 2 is blank, and the score is entered for the left and right images compared using the pairwise method. The rating is the total score in the leftmost column and bottom row.

https://doi.org/10.1371/journal.pone.0310305.g004

Fourteen evaluation score sheets (Fig 4) were prepared for the seven evaluators to assess items 1 and 2. The table in Fig 4 was not shown to the evaluators, who visually evaluated the 240 images displayed in a random order. If the image displayed on the right side was of better quality than that on the left side, one point was assigned to the cell in Fig 4 for that image.

As shown in Fig 3, if the image on the right showed more uniform accumulation of FDG in the basal ganglia and white matter, the cell in square 29 of the evaluation score sheet for item 2 was assigned a score of 1. Two images identical to Fig 3 were included in the presentation but arranged in opposite directions, that is, the 900 s image with 880 pixels was presented on the left side, and the 180 s image with 440 pixels was presented on the right side corresponding to square 212 in Fig 4. In this case, if the image on the left side was better, a score of 0 was assigned to square 212.

Previous reports have visually scored PET images with different acquisition times and the degree of glucose metabolism and malignancy in thyroid tumors on a 5-point scale [22, 23]. Based on these reports, we scored the images based on their acquisition times and pixel count.

When the evaluator recorded the score, it was entered for the left and right images compared using the pairwise method. For the displayed image, as shown in Fig 3, a score of 0 or 1 was recorded in the corresponding cell of Fig 4. This was performed for evaluation items 1 and 2.

Higher total scores in the rightmost column of Fig 4 represent better results. Additionally, lower total scores in the bottom row represent better results. These scores were totaled, and the average values were calculated to obtain the visual evaluation scores and ranks.

Evaluation of NMSE

For physical evaluation, we used a physical index based on the NMSE, which has been conventionally used to calculate the similarity between reference and target images. The ideal and acquired images were used as the reference and target images, respectively [24]. NMSE normalizes the target images using the maximum number of pixels. The smaller the calculated value, the closer it is to the ideal target image [24]. The computation is as shown in Eq 1. (1) where f(x, y) refers to the reference image, and g(x, y) refers to the target image.

The target image for 440-pixel images obtained with acquisition times of 120, 180, 300, 360, 450, 600, and 900 s was a 440-pixel image with an acquisition time of 1800 s. The NMSE for the images with seven other acquisition times was calculated. NMSE value was calculated for the 880-pixel images in the same manner.

Evaluation of no-reference metric

PIQE is a no-reference perception-based image quality evaluation method for real-world images. It uses the mean subtraction contrast normalization coefficient to calculate the image quality score [15]. The natural image quality evaluator (NIQE) is an existing blind image quality evaluation method that relies on opinion-based supervised learning to predict quality scores [25]. However, PIQE is an unsupervised method that does not require a learning model [15].

PIQE is inspired by the following principles of human perception of image quality. First, human visual attention is strongly directed to prominent points in an image or spatially active areas; this property is adapted by estimating distortions only in spatially prominent areas [8]. Second, local quality at the block/patch level is the overall quality of the image that humans perceive, and this property is addressed by calculating the distortion level at the local block level of size n × n, where n = 16 [15].

Fig 5 shows a block diagram of the proposed method. The input image was preprocessed, followed by a block-level analysis to identify the distortion [15]. Each distorted block was assigned a score based on the distortion type, and the block-level scores were then pooled to determine the overall image quality. In addition to the quality score, it also generates a spatial quality map that can be effectively used in other applications.

Download:

Fig 5. Block diagram of the proposed method.

The input image was subjected to a preprocessing step. A block-level analysis was performed to identify the distortion, and each distorted block was assigned a score based on the distortion type. The block-level scores were pooled to determine the overall image quality.

https://doi.org/10.1371/journal.pone.0310305.g005

In contrast, NIQE uses only measurable deviations from statistical regularities observed in natural images to calculate image quality scores in a completely blind manner [25]. It builds a collection of "quality-aware" statistical features based on a simple and successful spatial domain natural scene statistics (NSS) model [26, 27].

The distorted image quality is expressed as a simple distance metric between the model statistic and distorted image statistic [17]. Lower PIQE and NIQE scores indicate better imaging evaluations [15, 25].

No-reference metrics do not require a reference image. Therefore, the image quality was evaluated using PIQE and NIQE for both the 440-pixel and 880-pixel images obtained with eight different acquisition times: 120, 180, 300, 360, 450, 600, 900, and 1800 s.

Spearman’s rank difference test was performed. It is used in studies comparing interpretation results from AI-based methods with those of experienced readers, and for the comparison between human observers and mathematical models such as the channelized Hotelling observer [28, 29]. The significance level was set at P < 0.05.

To demonstrate that there was no significant difference in the ranking of PIQE results, the differences in PIQW values for each rank from 1st to 16th were calculated, resulting in 13 values. These 13 data points were divided into three groups, and Mann–Whitney’s U test was performed on them. The significance level was set at P < 0.05.

Evaluation of uniformity

To assess uniformity, a Region of Interest (ROI) was set on each image subjected to visual evaluation, and pixel values were measured. ROIs was positioned at the medulla of the frontal, temporal, and occipital lobes, ensuring that one edge of the ROI could be measured without crossing the boundary between the cortex and medulla. Fig 6 shows the site of the ROIs setting. The arrow in Fig 6 indicate the location of the ROI in the frontal lobe, the double arrow indicates the location of the ROI in the temporal lobe, and the arrowhead indicates the location of the ROI in the occipital lobe. Referring to previous literature, the size of the ROI was set to 5 mm in diameter [30, 31].

Download:

Fig 6. Chart of region of interest (ROI) set to evaluate uniformity.

The arrow indicates the location of the ROI in the frontal lobe, the double arrow indicates the location of the ROI in the temporal lobe, and the arrowhead indicates the location of the ROI in the occipital lobe. Referring to previous literature, the size of the ROI was set to 5 mm in diameter.

https://doi.org/10.1371/journal.pone.0310305.g006

Numerical evaluation was performed using the coefficient of variation (CV). CV was calculated for the images with each pixel count and acquisition time. The calculation is shown in Eq 2. (2) where σ is standard deviation (SD) and is average value.

Ethics statement

This study exclusively utilized phantom images and did not involve the use of clinical images or human imaging data. As such, there was no requirement for approval from the Ethics Committee. Furthermore, as patient image data were not utilized, no explanation or consent was sought from any patient.

Results

Results of visual evaluation

Tables 1 and 2 show the scores of the two items for each image from all raters, obtained using the paired comparison method. Evaluators 6 and 7, who were inexperienced, were excluded from the analysis as their results tended to differ from those of the other evaluators.

Download:

Table 1. Scores for items 1 and 2 from seven evaluators.

https://doi.org/10.1371/journal.pone.0310305.t001

Download:

Table 2. Scores for items 1 and 2 from seven evaluators.

https://doi.org/10.1371/journal.pone.0310305.t002

Fig 4 shows a scoring sheet for entering 0 and 1 to indicate the superiority of images in the paired comparison method. The bottom row of Fig 4 is a column for entering the sum of these numbers vertically for each image. Table 1 represents the results of that column. Higher scores represent better evaluation results. The rightmost column of Fig 4 is a column for entering the sum of these numbers horizontally for each image. Table 2 represents the results of that column. Lower scores represent better evaluation results.

The average visual evaluation results obtained using the pairwise comparison method for all items is shown in Table 3.

Download:

Table 3. Average scores for each image and their ranking from the five experienced evaluators.

https://doi.org/10.1371/journal.pone.0310305.t003

In Table 3, the scores of an 880-pixel image acquired at the specified shooting time are highlighted in bold. It was observed that for both the 880-pixel and 440-pixel images, higher scores were achieved with longer shooting times. Additionally, for images with acquisition times other than 120 s, a higher score was obtained when the number of pixels was 880.

Results of evaluation using NMSE

The target image for the 440-pixel and 880-pixel images obtained with acquisition times of 120, 180, 300, 360, 450, 600, and 900 s were images with an acquisition time of 1800 s and 440- and 880-pixels in size, respectively. The evaluation values for the images with seven other acquisition times were calculated using the NMSE. The results are shown in Tables 4 and 5.

Download:

Table 4. NMSE scores for seven images with 440 pixels.

https://doi.org/10.1371/journal.pone.0310305.t004

Download:

Table 5. NMSE scores for seven images with 880 pixels.

https://doi.org/10.1371/journal.pone.0310305.t005

Table 6 summarizes all the results and arranges them in the order of the NMSE score. For most images, the NMSE value improved and approached 0 as the acquisition time increased. Additionally, the physical evaluation results of images with 880 pixels were better than those of images with 440-pixels.

Download:

Table 6. NMSE scores for all 440- and 880-pixel images except for the 1800 s images and their ranking.

https://doi.org/10.1371/journal.pone.0310305.t006

Results of evaluation using PIQE and NIQE

The results of the physical evaluation using PIQE are presented in Table 7. Images with a lower no-reference metric value, longer acquisition time, and 880 pixels showed better results. Spearman’s significance test of the visual evaluation results and PIQE rankings showed a rank correlation coefficient (rs) of 0.9559 (p < 0.0001), indicating a strong correlation between the two methods (Fig 7).

Download:

Fig 7. Correlation between the visual assessment and PIQE rankings.

PIQE: Perception-based image quality evaluator. Spearman’s significant difference test between the visual assessment and PIQE rankings revealed a rs of 0.9559 (p < 0.0001), indicating a strong correlation.

https://doi.org/10.1371/journal.pone.0310305.g007

Download:

Table 7. PIQE scores and their ranking.

https://doi.org/10.1371/journal.pone.0310305.t007

Lower scores represent better image quality.

Fig 8 shows the results of Mann–Whitney’s U test, where PIQE differences were divided into three groups based on numerical rankings. The rankings of PIQE were classified into three groups: group 1 consisted of the differences between 1st and 4th place, comprising three numbers; group 2 included the differences between 5th and 12th place, comprising seven numbers; and group 3 comprised the values from 13th to 16th place. No significant difference was observed among these groups. The P value, which is the test value for groups 1 and 2, was p = 0.3619, the P value for groups 2 and 3 was p = 0.175, and the P value for groups 1 and 3 was p = 0.833.

Download:

Fig 8. Significant difference examined in the ranking of PIQE.

The difference between the top and bottom ranks from 1st to 16th. The difference between 1st and 4th place was group 1, the difference between 5th and 12th place was group 2, and the difference between 13th and 16th place was group 3. There was no significant difference among the three groups.

https://doi.org/10.1371/journal.pone.0310305.g008

Results of the physical evaluation using NIQE are also shown in Table 8. Spearman’s significance test of the visual evaluation and NIQE rankings yielded a rs of 0.2324 (p = 0.3865), indicating no significant correlation between the two methods (Fig 9).

Download:

Fig 9. Correlation between the visual assessment and NIQE rankings.

NIQE: natural image quality evaluator. Spearman’s significant difference test between the visual assessment and NIQE rankings revealed a rs of 0.2324 (p 0.3865), indicating no strong correlation between the two methods.

https://doi.org/10.1371/journal.pone.0310305.g009

Download:

Table 8. NIQE scores and their ranking.

https://doi.org/10.1371/journal.pone.0310305.t008

Results of the uniformity evaluation

Tables 9–11 shows the numerical results of the uniformity rating for the three areas: the frontal lobe, the temporal lobe, and the occipital lobe. Figs 10–12 is a graph of the numerical results of Tables 9–11.

Download:

Fig 10. Graphical display of the results of Table 9.

https://doi.org/10.1371/journal.pone.0310305.g010

Download:

Fig 11. Graphical display of the results of Table 10.

https://doi.org/10.1371/journal.pone.0310305.g011

Download:

Fig 12. Graphical display of the results of Table 11.

https://doi.org/10.1371/journal.pone.0310305.g012

Download:

Table 9. Results of the uniformity evaluation medulla of the frontal lobe.

https://doi.org/10.1371/journal.pone.0310305.t009

Download:

Table 10. Results of the uniformity evaluation medulla of the temporal lobe.

https://doi.org/10.1371/journal.pone.0310305.t010

Download:

Table 11. Results of the uniformity evaluation medulla of the occipital lobe.

https://doi.org/10.1371/journal.pone.0310305.t011

Discussion

In this study, we examined whether no-reference metrics can be applied for the quality evaluation of clinical images in nuclear medicine. The visual assessment of the images by five raters was compared with the NMSE, and a statistical correlation was determined. Evaluation using PIQE demonstrated a strong correlation with visual assessment, suggesting equivalence between these two methods.

The results ranked by evaluators 6 and 7 were inconsistent compared to the other evaluators. Consequently, their evaluations were excluded, underscoring the validity of the evaluators’ selection. This also underscores that the evaluation criteria are not easily applied by any evaluator.

Because NMSE evaluates the target image using a reference image, it is generally impossible to evaluate images with different numbers of pixels. In this study, images with different acquisition times were evaluated using NMSE scores, using different references for 880- and 440-pixel images.

From the PIQE results, if the proportion of statistical noise was approximately the same, higher resolution was associated with higher evaluation. This trend was also reflected in visual assessments, indicating the potential for objective evaluation of not only statistical noise but also resolution differences using PIQE. The images ranked 1st to 4th in the PIQE results in Table 7 are arranged in the correct order reflecting the acquisition time and the number of pixels. It also agrees with the results of visual evaluation in Table 1. The 13th to 16th low-quality images in Table 7 also reflect the acquisition time and the number of pixels. Although there is a ranking reversal between the 15th and 16th images in visual assessment, it pertains to a low-ranking image, typically not accepted in clinical imaging. The discrepancy in the visual evaluation ranking by the image experts is believed to be due to the unfamiliarity with low-resolution images. In the ranking of 5th to 12th from 880-600s to 440-300s, PIQE provided better results for images with a higher number of pixels compared to the acquisition time. Within this range, the images were arranged in order of acquisition time. Visual evaluation by an image expert revealed that the order of pixel count and acquisition time matched, unlike the results obtained from PIQE. It is considered that sharpness is prioritized over noise in this image quality range. Although there was a difference between the results of the visual evaluation and the PIQE ranking within this range, there was no significant difference in the ranking in the Spearman’s significance test, and it is considered that the PIQE has the same evaluation ability as the visual evaluation. In addition, we calculated the difference between the bottom and top rankings in PIQE values, that is, 880-1800s to 440-900s ranked 1st to 4^th, 880-600s to 440-300s ranked 5th to 12th, and 880s-180s to 440-120s ranked 13th to 15th. Intergroup comparison was performed by Mann-Whitney’s U test in three groups, there was no significant difference among them. It is thought that PIQE demonstrated the capability to evaluate the image quality of the 880-600s to 440-300s, ranked 5th to 12th, at a level comparable to that of visual evaluation.

Uniformity was evaluated, and as shown in Figs 8, 9 and 10, both 440- and 880-pixel images proved that the longer the imaging time, the higher the uniformity of the image. The results were almost consistent with the visual evaluation.

Visual evaluation of images from 1800 to 180 s showed that a longer acquisition time resulted in better evaluation scores (Table 3). The results and rankings obtained using NMSE were similar (Table 6). For images with the same acquisition time, 880-pixel images scored better than 440-pixel images (Table 3).

For images with an acquisition time of 120 s, the difference in ranking between 440 and 880 pixels was less than 0.2 points, which is a much smaller difference compared with that of the other rankings; however, it reversed the visual evaluation rankings (Table 3).

Evaluators 4 and 5 evaluated the ranking of 440- and 880-pixel images with a 120 s acquisition time, reversing the rating order for items 1 and 2 (Tables 1 and 2). They were diagnostic radiologists with more than 10 years of clinical experience and nuclear medicine specialists. This evaluation reversed the average rankings for the 440- and 880-pixel images at 120 s. For item 1, both evaluators found that the boundary between the white matter and gray matter of the temporal lobe and the peripapillary thalamus was clearer in the 440-pixel image because it had a wider area without accumulation. For item 2, the 440-pixel image showed more uniform accumulation because of a denser accumulation in the frontotemporal white matter, thalamus, and caudate nucleus in general. Noisy images acquired at 120 s were not of optimal quality for use in clinical imaging.

The rankings obtained from visual evaluation and the no-reference metric method were compared by five evaluators. Generally, supervised methods outperform unsupervised methods [26]. However, when creating a dataset for supervised learning in nuclear medicine, which is not well standardized, generating a standard image is not easy [13, 14]. It is more realistic to perform a general-purpose quantitative evaluation using supervised learning rather than a target [13, 14]. In this study, PIQE, an unsupervised method that does not require training data to evaluate image quality, yielded better results [15, 27]. Moreover, since PIQE does not depend on training data, it is considered a less environmentally dependent metric that can be handled on the same scale at all facilities conducting nuclear medicine examinations and imaging. Hence, PIQE may be an efficient image evaluation method.

The NIQE results showed no correlation with the visual evaluation results. This could be because NIQE is a supervised method that employs a learning model using natural scene statistics [17].

Similar to natural images, PET images follow the Poisson distribution for image generation [32]. Because no-reference quality metrics match subjective human quality scores over fully referenced quality metrics, PET image evaluation using a no-reference metric was expected to be useful. This is another reason why PIQE is more consistent with visual evaluation than NIQE.

Numerical evaluation plays an important role in the image evaluation and medical treatment fields [33, 34]. Studies have conducted various evaluations without setting a gold standard. While various evaluation methods exist without setting a gold standard, methods without reference images are expected to gain wider acceptance for scoring and ranking image quality in the future [35, 36].

Despite its strengths, this study had limitations, notably the absence of clinical imaging based on brain phantom images. Nonetheless, our findings suggest that PIQE may be comparable to visual evaluation by radiologists and specialists, offering potential applications in clinical image evaluation across various anatomical regions.

Conclusions

In conclusion, this study investigated the application of no-reference metrics, specifically PIQE, in evaluating image quality for clinical images in nuclear medicine. The results demonstrate that PIQE evaluations align closely with visual evaluations by specialists, suggesting its potential as a reliable method for clinical image quality assessment. Moving forward, additional research and validation are warranted to fully integrate no-reference metrics into routine clinical practice in nuclear medicine.

Acknowledgments

We are grateful to the radiological technicians at the Department of Radiology, Osaka Metropolitan University Hospital.

References

1. Ueda D., Katayama Y., Yamamoto A., Ichinose T., Arima H., Watanabe Y., et al. Deep Learning–based angiogram generation model for cerebral angiography without misregistration artifacts. Radiology. 2021; 299(3): 675–681. pmid:33787336
- View Article
- PubMed/NCBI
- Google Scholar
2. Zhu X., Cheng Y., Peng J., Wang R., Le M., & Liu X. Super-Resolved Image Perceptual Quality Improvement via Multi-Feature Discriminators. Computer Vision and Pattern Recognition. 2019.
- View Article
- Google Scholar
3. Blau Y., Mechrez R., Timofte R., Michaeli T., & Zelnik-Manor L. The 2018 PIRM Challenge on Perceptual Image Super-resolution. arXiv:1809.07517v3 [cs.CV]. 2019.
- View Article
- Google Scholar
4. Blau Y., & Michaeli T. The Perception-Distortion Tradeoff. arXiv:1711.06077v4 [cs.CV]. 2020.
- View Article
- Google Scholar
5. Yu Z., Rahman M. A., Schindler T. H., Gropler R. J., Laforest R., Wahl R. L., et al. Need for Objective Task-based Evaluation of Deep Learning-Based Denoising Methods: A Study in the Context of Myocardial Perfusion SPECT. Journal of Nuclear Medicine. 2020; 61(supplement 1): 575.
- View Article
- Google Scholar
6. Kabasakal L., Devos A., Fettich J., Franken P., Guilloteau D., Hustinx R., et al. Optimum tomographic reconstruction parameters for HMPAO brain SPET imaging: a practical approach based on subjective and objective indexes. European Journal of Nuclear Medicine. 1995; 22(8): 671–677. https://doi.org/10.1007/BF01254569.
- View Article
- Google Scholar
7. Okuda K., Fujii S., & Sakimoto S. Impact of Novel Incorporation of CT-based Segment Mapping into a Conjugated Gradient Algorithm on Bone SPECT Imaging: Fundamental Characteristics of a Context-specific Reconstruction Method. Asia Oceania Journal of Nuclear Medicine & Biology. 2019; 7(1): 49–57. https://doi.org/10.22038/aojnmb.2018.31711.1219.
- View Article
- Google Scholar
8. Händel P. Understanding Normalized Mean Squared Error in Power Amplifier Linearization. IEEE Microwave and Wireless Components Letters. 2018; 28(11): 1047–1049. https://doi.org/10.1109/LMWC.2018.2869299.
- View Article
- Google Scholar
9. Akamatsu M., Yamashita Y., Akamatsu G., Tsutsui Y., Ohya N., Nakamura Y., et al. Influences of reconstruction and attenuation correction in brain SPECT images obtained by the hybrid SPECT/CT device: evaluation with a 3-dimensional brain phantom. Asia Oceania Journal of Nuclear Medicine & Biology. 2015; 2(1): 24–29. https://doi.org/10.7508/aojnmb.2014.01.005.
- View Article
- Google Scholar
10. Abe K., Hosono M., Igarashi T., Iimori T., Ishiguro M., Ito T., et al. The 2020 national diagnostic reference levels for nuclear medicine in Japan. Annals of Nuclear Medicine. 2020; 34(11): 799–806. pmid:32852747
- View Article
- PubMed/NCBI
- Google Scholar
11. Raunig D. L., McShane L. M., Pennello G., Gatsonis C., Carson P. L., Voyvodic J. T., et al. Quantitative imaging biomarkers: A review of statistical methods for technical performance assessment. Statistical Methods in Medical Research. 2015; 24(1): 27–67. pmid:24919831
- View Article
- PubMed/NCBI
- Google Scholar
12. Aide N., Lasnon C., Veit-Haibach P., Sera T., Sattler B., & Boellaard R. EANM/EARL harmonization strategies in PET quantification: from daily practice to multicentre oncological studies. European Journal of Nuclear Medicine and Molecular Imaging. 2017; 44: S17–S31. pmid:28623376
- View Article
- PubMed/NCBI
- Google Scholar
13. Varrone A., Asenbaum S., Vander Borght T., Booij J., Nobili F., Någren K., et al. EANM procedure guidelines for PET brain imaging using [18F]FDG, version 2. European Journal of Nuclear Medicine and Molecular Imaging. 2009; 36: 2103–2110. pmid:19838705
- View Article
- PubMed/NCBI
- Google Scholar
14. Boellaard R., Delgado-Bolton R., Oyen W. J. G., Giammarile F., Tatsch K., Eschner W., et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. European Journal of Nuclear Medicine and Molecular Imaging. 2015; 42(3): 328–354. pmid:25452219
- View Article
- PubMed/NCBI
- Google Scholar
15. Venkatanath N., Praneeth D., Maruthi Chandrasekaran B., Sumohana Channappayya S., & Medasani S. S. Blind image quality evaluation using perception based features. In 2015 Twenty First National Conference on Communications (NCC). 2015. https://doi.org/10.1109/NCC.2015.7084843.
- View Article
- Google Scholar
16. Mittal A., Moorthy A. K., & Bovik A. C. No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing. 2012; 21(12): 4695–4708. pmid:22910118
- View Article
- PubMed/NCBI
- Google Scholar
17. Mittal A., Soundararajan R., & Bovik A. C. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters. 2013; 20(3): 209–212. https://doi.org/10.1109/LSP.2012.2227726.
- View Article
- Google Scholar
18. Japanese Society of Nuclear Medicine, PET Nuclear Medicine Committee. (2021). Phantom Testing Procedures for Brain PET Imaging Using 18F-FDG and Amyloid Imaging Agents, 5th Edition. Version 2021/2/8. https://jsnm.org/wp_jsnm/wp-content/uploads/2021/02/Dementia_PhantomTest_20210208.pdf.
- View Article
- Google Scholar
19. Fahey F. H., Clark P., Zukotynski K., Schuster D. M., Strauss K. J., Crandall B., et al. Use of a qualification phantom for PET brain imaging in a multicenter consortium: A collaboration between the Pediatric Brain Tumor Consortium and the SNMMI Clinical Trials Network. Journal of Nuclear Medicine. 2019.
- View Article
- Google Scholar
20. Seret J., Imbert L., Pietquin M., Karcher G., Lamiral Z., & Marie P.-Y. Head-to-head comparison of image quality between brain 18F-FDG images recorded with a fully digital versus a last-generation analog PET camera. EJNMMI Research. 2019; 9(1): 61. pmid:31300962
- View Article
- PubMed/NCBI
- Google Scholar
21. Bradley T. C., & Lang P. J. Measuring emotion: The Self-Assessment Manikin and the Semantic Differential. Journal of Behavior Therapy and Experimental Psychiatry. 1994; 25(1): 49–59. pmid:7962581
- View Article
- PubMed/NCBI
- Google Scholar
22. McDermott G. M., Chowdhury F. U., & Scarsbrook A. F. Evaluation of noise equivalent count parameters as indicators of adult whole-body FDG-PET image quality. Annals of Nuclear Medicine. 2013; 27(8): 855–861. pmid:23925895
- View Article
- PubMed/NCBI
- Google Scholar
23. Treglia G., Bertagna F., Sadeghi R., Ceriani L., Alavi A., & Giovanella L., et al. Diagnostic value of FDG PET-CT quantitative parameters and Deauville-like 5 point-scale in predicting malignancy of focal thyroid incidentaloma. Frontiers in Medicine. 2019; 6: 24. pmid:30809525
- View Article
- PubMed/NCBI
- Google Scholar
24. Chani J. C. C., Fager C., & Eriksson T. Lower bound for the normalized mean square error in power amplifier linearization. IEEE Microwave and Wireless Components Letters. 2018; 28(5): 435–437. https://doi.org/10.1109/LMWC.2018.2817021.
- View Article
- Google Scholar
25. Sheikh H. R., Sabir M. F., & Bovik A. C. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing. 2006; 15(11): 3440–3451. pmid:17076403
- View Article
- PubMed/NCBI
- Google Scholar
26. Marziliano P., Dufaux F., Winkler S., & Ebrahimi T. Perceptual blur and ringing metrics: application to JPEG2000. Signal Processing: Image Communication. 2004; 19(2): 163–172. https://doi.org/10.1016/j.image.2003.08.003.
- View Article
- Google Scholar
27. Sheikh H. R., Bovik A. C., & Cormack L. No-reference quality assessment using natural scene statistics: JPEG2000. IEEE Transactions on Image Processing. 2005; 14(11): 1918–1927. pmid:16279189
- View Article
- PubMed/NCBI
- Google Scholar
28. Cottereau N., Meignan M., Cottereau A. S., Becker S., Alavi A., & Stolz C., et al. Deep-learning 18F-FDG uptake classification enables total metabolic tumor volume estimation in diffuse large B-cell lymphoma. Journal of Nuclear Medicine. 2021; 62(1): 30–36. pmid:32532925
- View Article
- PubMed/NCBI
- Google Scholar
29. Gifford H. C., King M. A., de Vries D. J., & Soares E. J. Channelized hotelling and human observer correlation for lesion detection in hepatic SPECT imaging. Journal of Nuclear Medicine. 2000; 41(3): 514–521. pmid:10716327
- View Article
- PubMed/NCBI
- Google Scholar
30. Boellaard Ronald. Standards for PET Image Acquisition and Quantitative Data Analysis. Journal of Nuclear Medicine 2009; 50: 11–20. pmid:19380405
- View Article
- PubMed/NCBI
- Google Scholar
31. Marinke W., Jan P., Wim O., Otto H., Anne P., Eric V., et al. Quantification of FDG PET studies using standardised uptake values in multi-centre trials: effects of image reconstruction,resolution and ROI definition parameters. Eur J Nucl Med Mol Imaging (2007) 34:392–404. pmid:17033848
- View Article
- PubMed/NCBI
- Google Scholar
32. Van Slambrouck K., Stute S., Comtat C., Sibomana M., & van Velden F. H. P., et al. Bias reduction for low-statistics PET: Maximum likelihood reconstruction with a modified Poisson distribution. IEEE Transactions on Medical Imaging. 2015; 34(1): 126–136. pmid:25137726
- View Article
- PubMed/NCBI
- Google Scholar
33. Almeida B. A. L. C., Rodrigues H. L.d. N. R., Almeida P. L.,Araújo S. M. A. J.Immediate effect of ankle mobilization on range of motion, dynamic knee valgus, and knee pain in women with patellofemoral pain and ankle dorsiflexion restriction: A randomized controlled trial with 48-hour follow-up. Journal of Sport Rehabilitation. 2021; 30: pmid:33373976
- View Article
- PubMed/NCBI
- Google Scholar
34. Zhang, J. Making diagnoses with multiple tests under no gold standard. Retrieved from Iowa Research Online: https://ir.uiowa.edu/etd/3025. 2012. https://doi.org/10.17077/etd.wq3qkdxu.
35. Kijowski M. A., Wilson J. J., Klaers J., Spector D., Fine J. P., King M. A., et al. Comparing cardiac ejection fraction estimation algorithms without a gold standard. Academic Radiology. 2006; 13(3): 329–337. pmid:16488845
- View Article
- PubMed/NCBI
- Google Scholar
36. Liu J., Liu Z., Mhlanga J., Siegel B. A., & Jha A. K. A no-gold-standard technique to objectively evaluate quantitative imaging methods using patient data: Theory. Medical Physics. 2020. arXiv:2006.02290. https://doi.org/10.48550/arXiv.2006.02290.
- View Article
- Google Scholar

[ref1] 1. Ueda D., Katayama Y., Yamamoto A., Ichinose T., Arima H., Watanabe Y., et al. Deep Learning–based angiogram generation model for cerebral angiography without misregistration artifacts. Radiology. 2021; 299(3): 675–681. pmid:33787336
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Zhu X., Cheng Y., Peng J., Wang R., Le M., & Liu X. Super-Resolved Image Perceptual Quality Improvement via Multi-Feature Discriminators. Computer Vision and Pattern Recognition. 2019.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Blau Y., Mechrez R., Timofte R., Michaeli T., & Zelnik-Manor L. The 2018 PIRM Challenge on Perceptual Image Super-resolution. arXiv:1809.07517v3 [cs.CV]. 2019.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Blau Y., & Michaeli T. The Perception-Distortion Tradeoff. arXiv:1711.06077v4 [cs.CV]. 2020.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. Yu Z., Rahman M. A., Schindler T. H., Gropler R. J., Laforest R., Wahl R. L., et al. Need for Objective Task-based Evaluation of Deep Learning-Based Denoising Methods: A Study in the Context of Myocardial Perfusion SPECT. Journal of Nuclear Medicine. 2020; 61(supplement 1): 575.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref6] 6. Kabasakal L., Devos A., Fettich J., Franken P., Guilloteau D., Hustinx R., et al. Optimum tomographic reconstruction parameters for HMPAO brain SPET imaging: a practical approach based on subjective and objective indexes. European Journal of Nuclear Medicine. 1995; 22(8): 671–677. https://doi.org/10.1007/BF01254569.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref7] 7. Okuda K., Fujii S., & Sakimoto S. Impact of Novel Incorporation of CT-based Segment Mapping into a Conjugated Gradient Algorithm on Bone SPECT Imaging: Fundamental Characteristics of a Context-specific Reconstruction Method. Asia Oceania Journal of Nuclear Medicine & Biology. 2019; 7(1): 49–57. https://doi.org/10.22038/aojnmb.2018.31711.1219.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Händel P. Understanding Normalized Mean Squared Error in Power Amplifier Linearization. IEEE Microwave and Wireless Components Letters. 2018; 28(11): 1047–1049. https://doi.org/10.1109/LMWC.2018.2869299.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. Akamatsu M., Yamashita Y., Akamatsu G., Tsutsui Y., Ohya N., Nakamura Y., et al. Influences of reconstruction and attenuation correction in brain SPECT images obtained by the hybrid SPECT/CT device: evaluation with a 3-dimensional brain phantom. Asia Oceania Journal of Nuclear Medicine & Biology. 2015; 2(1): 24–29. https://doi.org/10.7508/aojnmb.2014.01.005.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref10] 10. Abe K., Hosono M., Igarashi T., Iimori T., Ishiguro M., Ito T., et al. The 2020 national diagnostic reference levels for nuclear medicine in Japan. Annals of Nuclear Medicine. 2020; 34(11): 799–806. pmid:32852747
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref11] 11. Raunig D. L., McShane L. M., Pennello G., Gatsonis C., Carson P. L., Voyvodic J. T., et al. Quantitative imaging biomarkers: A review of statistical methods for technical performance assessment. Statistical Methods in Medical Research. 2015; 24(1): 27–67. pmid:24919831
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref12] 12. Aide N., Lasnon C., Veit-Haibach P., Sera T., Sattler B., & Boellaard R. EANM/EARL harmonization strategies in PET quantification: from daily practice to multicentre oncological studies. European Journal of Nuclear Medicine and Molecular Imaging. 2017; 44: S17–S31. pmid:28623376
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref13] 13. Varrone A., Asenbaum S., Vander Borght T., Booij J., Nobili F., Någren K., et al. EANM procedure guidelines for PET brain imaging using [18F]FDG, version 2. European Journal of Nuclear Medicine and Molecular Imaging. 2009; 36: 2103–2110. pmid:19838705
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref14] 14. Boellaard R., Delgado-Bolton R., Oyen W. J. G., Giammarile F., Tatsch K., Eschner W., et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. European Journal of Nuclear Medicine and Molecular Imaging. 2015; 42(3): 328–354. pmid:25452219
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref15] 15. Venkatanath N., Praneeth D., Maruthi Chandrasekaran B., Sumohana Channappayya S., & Medasani S. S. Blind image quality evaluation using perception based features. In 2015 Twenty First National Conference on Communications (NCC). 2015. https://doi.org/10.1109/NCC.2015.7084843.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref16] 16. Mittal A., Moorthy A. K., & Bovik A. C. No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing. 2012; 21(12): 4695–4708. pmid:22910118
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref17] 17. Mittal A., Soundararajan R., & Bovik A. C. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters. 2013; 20(3): 209–212. https://doi.org/10.1109/LSP.2012.2227726.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref18] 18. Japanese Society of Nuclear Medicine, PET Nuclear Medicine Committee. (2021). Phantom Testing Procedures for Brain PET Imaging Using 18F-FDG and Amyloid Imaging Agents, 5th Edition. Version 2021/2/8. https://jsnm.org/wp_jsnm/wp-content/uploads/2021/02/Dementia_PhantomTest_20210208.pdf.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref19] 19. Fahey F. H., Clark P., Zukotynski K., Schuster D. M., Strauss K. J., Crandall B., et al. Use of a qualification phantom for PET brain imaging in a multicenter consortium: A collaboration between the Pediatric Brain Tumor Consortium and the SNMMI Clinical Trials Network. Journal of Nuclear Medicine. 2019.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref20] 20. Seret J., Imbert L., Pietquin M., Karcher G., Lamiral Z., & Marie P.-Y. Head-to-head comparison of image quality between brain 18F-FDG images recorded with a fully digital versus a last-generation analog PET camera. EJNMMI Research. 2019; 9(1): 61. pmid:31300962
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref21] 21. Bradley T. C., & Lang P. J. Measuring emotion: The Self-Assessment Manikin and the Semantic Differential. Journal of Behavior Therapy and Experimental Psychiatry. 1994; 25(1): 49–59. pmid:7962581
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref22] 22. McDermott G. M., Chowdhury F. U., & Scarsbrook A. F. Evaluation of noise equivalent count parameters as indicators of adult whole-body FDG-PET image quality. Annals of Nuclear Medicine. 2013; 27(8): 855–861. pmid:23925895
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref23] 23. Treglia G., Bertagna F., Sadeghi R., Ceriani L., Alavi A., & Giovanella L., et al. Diagnostic value of FDG PET-CT quantitative parameters and Deauville-like 5 point-scale in predicting malignancy of focal thyroid incidentaloma. Frontiers in Medicine. 2019; 6: 24. pmid:30809525
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref24] 24. Chani J. C. C., Fager C., & Eriksson T. Lower bound for the normalized mean square error in power amplifier linearization. IEEE Microwave and Wireless Components Letters. 2018; 28(5): 435–437. https://doi.org/10.1109/LMWC.2018.2817021.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref25] 25. Sheikh H. R., Sabir M. F., & Bovik A. C. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing. 2006; 15(11): 3440–3451. pmid:17076403
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref26] 26. Marziliano P., Dufaux F., Winkler S., & Ebrahimi T. Perceptual blur and ringing metrics: application to JPEG2000. Signal Processing: Image Communication. 2004; 19(2): 163–172. https://doi.org/10.1016/j.image.2003.08.003.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref27] 27. Sheikh H. R., Bovik A. C., & Cormack L. No-reference quality assessment using natural scene statistics: JPEG2000. IEEE Transactions on Image Processing. 2005; 14(11): 1918–1927. pmid:16279189
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref28] 28. Cottereau N., Meignan M., Cottereau A. S., Becker S., Alavi A., & Stolz C., et al. Deep-learning 18F-FDG uptake classification enables total metabolic tumor volume estimation in diffuse large B-cell lymphoma. Journal of Nuclear Medicine. 2021; 62(1): 30–36. pmid:32532925
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref29] 29. Gifford H. C., King M. A., de Vries D. J., & Soares E. J. Channelized hotelling and human observer correlation for lesion detection in hepatic SPECT imaging. Journal of Nuclear Medicine. 2000; 41(3): 514–521. pmid:10716327
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref30] 30. Boellaard Ronald. Standards for PET Image Acquisition and Quantitative Data Analysis. Journal of Nuclear Medicine 2009; 50: 11–20. pmid:19380405
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref31] 31. Marinke W., Jan P., Wim O., Otto H., Anne P., Eric V., et al. Quantification of FDG PET studies using standardised uptake values in multi-centre trials: effects of image reconstruction,resolution and ROI definition parameters. Eur J Nucl Med Mol Imaging (2007) 34:392–404. pmid:17033848
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref32] 32. Van Slambrouck K., Stute S., Comtat C., Sibomana M., & van Velden F. H. P., et al. Bias reduction for low-statistics PET: Maximum likelihood reconstruction with a modified Poisson distribution. IEEE Transactions on Medical Imaging. 2015; 34(1): 126–136. pmid:25137726
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref33] 33. Almeida B. A. L. C., Rodrigues H. L.d. N. R., Almeida P. L.,Araújo S. M. A. J.Immediate effect of ankle mobilization on range of motion, dynamic knee valgus, and knee pain in women with patellofemoral pain and ankle dorsiflexion restriction: A randomized controlled trial with 48-hour follow-up. Journal of Sport Rehabilitation. 2021; 30: pmid:33373976
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref34] 34. Zhang, J. Making diagnoses with multiple tests under no gold standard. Retrieved from Iowa Research Online: https://ir.uiowa.edu/etd/3025. 2012. https://doi.org/10.17077/etd.wq3qkdxu.

[ref35] 35. Kijowski M. A., Wilson J. J., Klaers J., Spector D., Fine J. P., King M. A., et al. Comparing cardiac ejection fraction estimation algorithms without a gold standard. Academic Radiology. 2006; 13(3): 329–337. pmid:16488845
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref36] 36. Liu J., Liu Z., Mhlanga J., Siegel B. A., & Jha A. K. A no-gold-standard technique to objectively evaluate quantitative imaging methods using patient data: Theory. Medical Physics. 2020. arXiv:2006.02290. https://doi.org/10.48550/arXiv.2006.02290.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

Figures

Abstract

Background

Purpose

Methods

Results

Conclusions

Introduction

Contributions and findings

Materials and methods

Image acquisition method and analysis

Evaluator and image

Visual evaluation method

Evaluation of NMSE

Evaluation of no-reference metric

Evaluation of uniformity

Ethics statement

Results

Results of visual evaluation

Results of evaluation using NMSE

Results of evaluation using PIQE and NIQE

Results of the uniformity evaluation

Discussion

Conclusions

Acknowledgments

References