Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Synergistic fusion: An integrated pipeline of CLAHE, YOLO models, and advanced super-resolution for enhanced thermal eye detection

Abstract

Accurate eye detection in thermal images is essential for diverse applications, including biometrics, healthcare, driver monitoring, and human-computer interaction. However, achieving this accuracy is often hindered by the inherent limitations of thermal data, such as low resolution and poor contrast. This work addresses these challenges by proposing a novel, multifaceted approach that combines both deep learning and image processing techniques. We first introduce a unique dataset of thermal facial images captured with meticulous eye location annotations. To improve image clarity, we employ Contrast Limited Adaptive Histogram Equalization (CLAHE). Subsequently, we explore the effectiveness of advanced YOLO models (YOLOv8 and YOLOv9) for accurate eye detection. Our experiments reveal that YOLOv8 with CLAHE-enhanced images achieved the highest accuracy (precision and recall of 1, mAP50 of 0.995, and mAP50-95 of 0.801), the YOLOv9 model also demonstrated excellent performance with a precision of 0.998, recall of 0.998, mAP-50 of 0.995, and mAP50-95 of 0.753. Furthermore, to enhance the resolution of detected eye regions, we investigate various super-resolution techniques, ranging from traditional methods like Bicubic interpolation to cutting-edge approaches like generative adversarial networks (BSRGAN, ESRGAN) and advanced models like Real-ESRGAN, SwinIR, and SwinIR-Large with ResShift. The performance of these techniques is evaluated using both objective and subjective quality measures. Overall, this work demonstrates the effectiveness of our proposed pipeline, which seamlessly integrates image enhancement, deep learning, and super-resolution techniques. This synergic fusion significantly improves the contrast, accuracy of eye detection, and overall resolution of thermal images, paving the way for potential applications across various fields.

Introduction

Thermal imaging is a promising technique that offers contact-free and reliable detection of facial features like eyes, crucial for various applications such as security, human-computer interaction, the automotive industry, healthcare, search and rescue, emotion recognition, electrical, and occupational safety [13]. Unlike traditional face recognition methods that struggle in low-light or obscured conditions, thermal imaging, with its unique ability to sense infrared radiation, unlocks new possibilities for various technologies [4]. Yawen Lu et al proposed an alternative to LiDAR for nighttime depth estimation using a single thermal image making it a viable solution for nighttime autonomous navigation [5]. These infrared thermal images were created using infrared thermal cameras, which can detect the heat distribution on a target object’s surface [6]. Thermal images obtained using thermal cameras are called thermograms [7]. Various thermal cameras offer a range of color palettes like Iron, Gray, Rainbow, Arctic, Lava, and more [8]. In this article, thermal images used are in the Rainbow color palette. This palette assigns colors to different temperature levels, with cooler areas in blues and greens and warmer regions in yellows and reds. The choice of color palette can significantly impact the interpretation of thermal data and is often tailored to specific applications and preferences. Enhancing thermal images is crucial due to their tendency to suffer from blurring, yet it’s essential to implement enhancement techniques carefully to preserve the temperature distribution information without sacrificing details. Several methods have been introduced in the past to improve the contrast of thermal images, aiming to make details more discernible and enhance overall visibility [9]. The enhancement of thermal images is important for extracting detailed information from low-resolution data, improving object recognition, and enhancing overall image clarity. This process aids in various applications where precise thermal analysis is vital.

The review of the literature reveals a multitude of detection techniques utilizing thermal imagery across a range of applications. To diagnose ocular surface eye disorders, Padmapriya et al. developed YOLOv2-based eye localization of thermal images [10]. Ilikci et al. suggested utilizing the YOLOv3 algorithm to recognize emotions and faces in thermal images [11]. Ghourabi et al [12] employed the YOLOv5 algorithm to identify the eye region from the thermal image to pinpoint the precise temperature in the canthus of the eye and identify any infections that might be present. Ghourabi et al. [13] used YOLOv7 for eye recognition to compute eye canthus temperature in the elderly. Using machine learning and the Internet of Things (IoT), Klaib et al. [14] discussed a range of algorithms that tracked the human eye for various applications, particularly for the elderly and human-computer interaction. A method to identify the face and eye from a thermal image was created by Budzan et al. utilizing a mix of template matching expertise and a modified Randomised Hough Transform (RHT) [15]. Fast eye detection in thermal pictures was carried out by Knapik et al. utilizing a bag-of-visual-words technique with clustering [16]. Zijie Zhou et al. explored driver vigilance detection using thermal imaging, integrating YOLOv5 for facial region detection and a hybrid-attention deep learning model for classification [17]. Various deep learning-based algorithms have been proposed to detect and localize the face and eyes in thermal images [18]. However, due to blurred edges and lower quality compared to visible-light images, the process of enhancing eye resolution in thermal images is more complicated. In the realm of facial recognition with low-resolution thermal images, a texture-based detector is applied, employing Haar features and the AdaBoost algorithm. Subsequently, the interplay among these facial characteristics is analyzed by employing a complex Gaussian distribution, which remains unaffected by rotation [19]. In the context of Child-Robot Interaction, a system for emotion recognition in children has been proposed, which records facial images using both visual and Infrared Thermal imaging [20]. Many deep learning architectures [21] have been explored by researchers to enhance the resolution of thermal images. GAN-based methods were used to enhance and restore the resolution of thermal images [22]. Researchers have found that customizing guiding information for thermal imaging yields superior results, especially within the framework of guided super-resolution [23]. This system could potentially be adapted to enhance the resolution of eye images.

Thermal images inherently suffer from low resolution and poor contrast due to the nature of infrared sensors and the thermal radiation they detect [19]. This limitation is primarily due to the longer wavelengths of thermal radiation compared to visible light, leading to less distinct edges and finer details [9]. Thermal images typically exhibit poor contrast because the temperature differences between objects are often minimal, causing the images to appear flat and less informative. These challenges can hinder the accuracy and effectiveness of thermal imaging, necessitating advanced image processing techniques to enhance the quality and extract meaningful information [24]. While deep learning-based models leverage reference images and feature fusion for enhanced quality assessment [25,26], our study focuses on a more practical non-reference approach. Given the absence of reference images in thermal imaging, we evaluate the quality of super-resolved thermal images using well-established non-reference image quality assessment metrics along with subjective Mean Opinion Scores to ensure a comprehensive assessment.

Existing research articles typically address the improvement of image quality by focusing on one specific aspect, such as detection, enhancement, or super-resolution. Each of these approaches has its own merits, contributing to either better image resolution or enhanced contrast, but they are generally applied in isolation. However, in our work, we have developed a hybrid model that combines these techniques into a single, integrated approach. By simultaneously incorporating enhancement, detection, and super-resolution methods, our model provides a more holistic solution to image quality improvement. This comprehensive strategy ensures that we achieve both higher contrast and superior resolution at the same time, rather than prioritizing one over the other. The integration of these techniques allows for more effective and efficient image processing, resulting in clearer, more detailed thermal images that can be crucial for accurate analysis and interpretation. Our research focuses on creating a highly accurate and efficient system for locating eyes in human faces using infrared thermal images. The research aims to overcome the inherent limitations of thermal images, namely their low resolution and poor contrast. The process involves curating and annotating a large database of thermal face images, enhancing image quality with CLAHE (Contrast Limited Adaptive Histogram Equalization), and employing various YOLO versions to locate eye regions. After eye identification, the importance of high-resolution thermal images is emphasized, prompting the exploration of super-resolution techniques. These methods aim to enhance spatial resolution and address the common challenge of low-resolution thermal imaging. Our proposed work to detect the human eyes on thermal images and enhance the resolution includes the following novel contributions:

  • Multifaceted novel pipeline for accurate eye detection in thermal images, addressing low resolution and poor contrast.
  • Introduced a unique dataset with labelled eye locations for training and evaluation.
  • Pre-processed thermal images with CLAHE for improved contrast and boost eye detection.
  • Demonstrated the effectiveness of YOLO models (especially YOLOv8 and YOLOv9) for improved detection accuracy.
  • Evaluated super-resolution techniques to enhance the resolution of detected eyes.

This paper is organized into seven sections. Section 2 introduces the thermal image dataset. Section 3 provides a brief idea about the methodology and the algorithms used. Section 4 provides an in-depth discussion of the performance analysis and results achieved for different detection and super-resolution algorithms. Section 5 discusses the practical implications and potential applications. Section 6 outlines the limitations and suggests future improvements. The conclusions drawn from the observations are presented in Section 7.

Materials-dataset details

There is a significant shortage of high-quality thermal human face image datasets for eye detection. Existing datasets primarily focus on thermal emotion recognition and pedestrian detection, with many researchers relying on private datasets that are not publicly accessible. This highlights the urgent need for a standardized thermal human face image dataset. To address this gap, we created a dataset using a FLIR-E75 thermal imaging camera, capturing images at a resolution of 640x480 pixels. The thermal images were collected from 308 volunteers at Vellore Institute of Technology, Chennai, including 50 females and 258 males aged 20–40, all of whom provided proper consent. Volunteers include students, research scholars, and faculty members.

This study received approval from the Ethics Committee at Vellore Institute of Technology, Chennai. The dataset was collected over a period of two months, spanning from June 3, 2024, to July 31, 2024. The data collection adhered strictly to the ethical principles outlined in the Declaration of Helsinki for research involving human subjects. Participants were verbally informed about the research’s nature and everyone signed a consent form authorizing the use of their data for academic and research purposes. Participants under the age of 18, including children, were excluded from the study. The consent form included a concise explanation of data collection details. In accordance with the guidelines and format provided by the Institutional Ethics Committee for Studies on Human Subjects (IECH) at Vellore Institute of Technology, Chennai, the consent form was detailed, and participants’ written consents were collected. All consent forms have been securely stored, and the collected image data has been anonymized. Data was collected in a controlled environment to ensure standardization and support various applications, including healthcare monitoring of eye temperature.

Participants were positioned one meter from the camera. Precise annotation is crucial for supervised learning in eye detection, so each image was meticulously annotated with bounding boxes around the eye regions by well-trained researchers and reviewed by two teams of computer vision experts for accuracy. The dataset is divided into three groups: 70% for training, 20% for validation, and 10% for testing, ensuring robust evaluation of the model’s performance. The self-curated dataset aims to advance research in thermal eye detection by providing a valuable resource for the scientific community.

Augmentation enriches the dataset by introducing diverse variations, enabling deep learning models to generalize better across different scenarios and mitigate overfitting. It enhances model robustness by simulating real-world variability, leading to more reliable and adaptable performance. The curated dataset of 308 images is subjected to various augmentation steps. This augmentation includes flipping the images horizontally and vertically, performing various rotations (90 degrees clockwise, counter-clockwise, and upside down), introducing random rotations between −20 and 20 degrees, and applying horizontal and vertical shear distortions of up to 15 degrees. Each original image undergoes augmentation to produce approximately eight distinct images. In total, 2318 images are generated through this augmentation process. Flipping, rotation, and shear distortions are utilized to augment the images because these augmentation techniques preserve the pixel values and the temperature information encoded in the thermal images. They do not change the pixel values or add any noise. Table 1 summarises the number of images in the dataset. Sample thermal images of the dataset are depicted in Fig 1.

thumbnail
Fig 1. Sample images used & augmented images used.

The first row represents the original thermal images captured using a FLIR camera. The second row shows augmented versions of these images, incorporating transformations such as rotation, scaling, and vertical shear distortions.

https://doi.org/10.1371/journal.pone.0328227.g001

Methodology and algorithms used

The first step is thermal image acquisition, which is the most critical phase in this methodology. The captured images are then labeled, annotated, and augmented to create a dataset. This dataset is subsequently divided into training, validation, and testing sets. The proposed methodology’s flow diagram is illustrated in Fig 2. Two sets of experiments are conducted to evaluate how well different YOLO algorithms perform in detecting eye regions in thermal images of human faces. In the first experiment, the YOLO algorithms are applied directly to the raw dataset. In the second experiment, the YOLO algorithms are applied to images that have been enhanced using various image processing techniques. Once a YOLO model has been selected, it is used to detect the eye regions in the thermal images. The Region of Interest (ROI) that is the eye regions are then cropped from the images based on the identified coordinates. Due to the cropping process, the resolution of the eye regions is low. To address this, various super-resolution algorithms are employed to enhance the resolution of the cropped eye regions. The performance of these super-resolution algorithms is then evaluated. Algorithm I provides a detailed breakdown of the steps involved.

thumbnail
Fig 2. Block diagram of the proposed work.

The block diagram outlines the workflow for eye detection and enhancement from thermal images. It includes image acquisition, annotation, augmentation, contrast enhancement, YOLO-based detection, ROI extraction, super-resolution application, and performance evaluation.

https://doi.org/10.1371/journal.pone.0328227.g002

Enhancement using CLAHE

To improve the contrast of thermal face images, Local Histogram Equalization (LHE) with Bilateral Filtering is employed. LHE with Bilateral Filtering, a more sophisticated approach, enhances contrast locally while preserving details [27]. It first converts the image to LAB color space and applies bilateral filtering to the luminance channel (L) to smooth intensity variations while maintaining edges. Subsequently, LHE is applied to small image patches within the filtered L channel, enhancing contrast localized to specific regions. Finally, the processed L channel is merged with the original A and B channels, and the image is converted back to its original color space. This combined approach ensures improved visibility of temperature variations in human faces within the thermal images.

CLAHE, which stands for Contrast Limited Adaptive Histogram Equalization, is an image processing technique employed to enhance the contrast of thermal images. CLAHE is an extension of LHE that addresses the over-amplification of noise issues. It limits the amplification by clipping the histogram at a specified value (clip limit) before performing histogram equalization in each patch. By dividing the thermal image into tiles or blocks, CLAHE ensures local contrast improvement while preventing noise exaggeration associated with global equalization. The degree of contrast enhancement allowed in each tile is controlled by a parameter known as the contrast limiting factor. This factor regulates the process by clipping histogram bins that exceed a specified threshold, thus avoiding the exaggeration of noise or extreme contrast in those areas [28]. When applying CLAHE to thermal images, it is often done within the context of the LAB color space. This choice is advantageous because the LAB color space separates color information from the thermal intensity (brightness) information. This separation allows CLAHE to boost contrast while preserving the essential thermal details. An ablation study was conducted to investigate the effects of different combinations of ClipLimit and GridSize parameters. This study involved testing various settings, including (ClipLimit = 2, GridSize = 4x4), (ClipLimit = 2, GridSize = 8x8), (ClipLimit = 4, GridSize = 16x16), and (ClipLimit = 10, GridSize = 32x32). The application of CLAHE to thermal images finds wide utility in various fields, including thermal surveillance, medical thermography, and other areas where improved contrast is essential for accurate visual interpretation and analysis.

Detection techniques using YOLOv8 and YOLOv9

YOLO (You Only Look Once) creates a grid out of the image. Certain numbers, such as class probabilities and bounding box parameters, are computed for each grid. Additionally, a class prediction is based on every cell. YOLO performs well on objects of different sizes because it uses multi-scale feature maps to gather global context data. The algorithm is also skilled at locating tiny items, which can be difficult for other detection techniques [29]. YOLO has gone through several evolutions from YOLOv1 to recent YOLO version 9 being major developments that have further enhanced its accuracy and performance. The YOLO family of models has three main blocks, they are backbone, neck, and head. YOLO has been modified to do specialized tasks including face detection, position estimation, and even eye recognition in infrared thermal photos of human faces. This work examines YOLO’s adaptability beyond basic object detection. YOLO’s influence on the direction of object detection is still very important as it develops and motivates new computer vision research. Each iteration introduces novel features aimed at improving accuracy and performance. YOLOv5 incorporates CSPDarknet53 and CSP-PANet techniques [30], enhancing eye detection accuracy through feature map splitting and aggregation. Where, CSP stands for Cross Stage Partial connections. YOLOv5 computes the class loss and objectness loss using Binary Cross Entropy (BCE), while location loss is determined using CIoU (Complete Intersection over Union) loss. These three components collectively shape the YOLO loss function, as depicted in Equation 1.

(1)

YOLOv7, inspired by ELAN (Efficient Layer Aggregation Network) architecture, integrates Feature Pyramid Network (FPN) and BoF techniques for superior eye localization without relying on pre-trained backbones [31].

YOLOv8 architecture.

The latest iteration of YOLO, known as YOLOv8, maintains the same architectural foundation as its predecessors but introduces several significant enhancements for precise eye localization in thermal human face images. Let the input image be denoted by: , where is the height, is the width, and is the number of channels. Illustrated in Fig 3, YOLOv8 integrates the Path Aggregation Network (PAN) [32] and FPN, along with an innovative labeling tool to streamline the annotation process. The backbone network extracts feature maps from the input image at various levels. The feature extraction process involves multiple convolutional layers. FPN progressively reduces the spatial resolution of input images while increasing the number of feature channels, generating feature maps capable of detecting eyes at various scales and resolutions. Specifically, the FPN creates a pyramid structure by combining high-resolution low-level features with low-resolution high-level features. The output of the FPN is expressed as in equation 2.

thumbnail
Fig 3. YOLOV8 Architecture.

The YOLOv8 architecture follows a multi-scale feature extraction approach. The backbone processes input images through different feature levels (P1–P5). ‘C’ denotes concatenation, ‘C2f’ represents a modified CSP layer integrating two parallel gradient flow branches, and ‘U’ indicates upsampling. The detection head outputs bounding boxes (Bbox) optimized using CIoU-DFL and classification scores (Cls) using BCE loss.

https://doi.org/10.1371/journal.pone.0328227.g003

(2)

Where denotes the feature map at the level . performs convolution operations on the Connections. Conversely, PAN employs skip connections to merge features from multiple network layers, enhancing adaptability and effectiveness due to its anchor-free detection approach. YOLOv8 incorporates a modified CSP layer, referred to as the C2f module, which combines Concatenation (C), Upsampling (U), and the C2f module to elevate eye detection accuracy [33]. The C2f module is mathematically represented in equation 3.

(3)

Where denotes the current feature map and denotes the previous feature map. For bounding box loss, YOLOv8 employs Distribution Focal Loss (DFL) and Complete Intersection over Union (CIoU) loss functions, while classification loss relies on binary cross-entropy. The loss function is given in Equation 4.

(4)

Where , , are the weights for box loss, class loss, and Distribution focal loss. And and N represents the weight decay and number of cells that contain the object. The result is an impressive combination of speed and precision, making YOLOv8 a powerful choice for eye localization in thermal human face images [34].

YOLOv9 architecture innovations and improvements.

YOLOv9, the latest iteration in the YOLO series, stands out for its exceptional performance, efficiency, and innovative techniques that address key challenges in deep learning, particularly information loss during training. To combat this, YOLOv9 introduces Programmable Gradient Information (PGI), employing an auxiliary reversible branch to ensure information preservation and reliable gradient generation [35]. PGI is designed to preserve crucial information and ensure reliable gradient generation during the training process. It employs an auxiliary reversible branch that works in conjunction with the main network. The reversible branch helps in backward propagation to maintain gradient flow and prevent vanishing gradients. This auxiliary pathway is expressed in Equations 5, 6, 7, and 8.

Forward Pass:

(5)

Where is the input to the network and is the transformation applied by the network layers.

Reversible auxiliary branch:

(6)(7)

Where and are auxiliary functions that preserve information.

Gradient calculation:

(8)

Where is the loss function. This auxiliary pathway enhances the model’s learning process, enabling effective extraction of crucial insights even with lightweight architectures. Additionally, YOLOv9 incorporates the Generalized Efficient Layer Aggregation Network (GELAN) to efficiently merge features from different network levels, crucial for accurate detection of objects of various sizes and scales. Unlike conventional methods, GELAN utilizes conventional convolutions to achieve better use of parameters without sacrificing efficiency. The GELAN process of feature extraction and feature aggregation is represented in Equations 9 and 10 respectively.

(9)

Where represents the conventional convolutional operation at the layer and is the input image.

(10)

Where are learnable weights and is the aggregated feature map. Figs 4 and 5 illustrate the two key innovations within YOLOv9’s architecture: PGI and the GELAN. These components work together to enhance the model’s performance. Furthermore, YOLOv9 adopts a novel mosaic data augmentation technique, randomly combining four images into a single training image to create a diverse and realistic dataset, thereby enhancing generalization and reducing overfitting. The mosaic augmentation is described in Equations 11 and 12 respectively.

thumbnail
Fig 4. Programmable Gradient Information (PGI).

An auxiliary reversible branch that preserves gradient flow, preventing information loss and enhancing feature extraction.

https://doi.org/10.1371/journal.pone.0328227.g004

thumbnail
Fig 5. Architecture of Generalized Efficient Layer Aggregation Network (GELAN).

A hierarchical structure that efficiently merges multi-scale features using conventional convolutions for improved object detection.

https://doi.org/10.1371/journal.pone.0328227.g005

(11)

Where are the four images to be combined.

(12)

Where arranges the images in a grid and extracts a random region. The overall loss function for YOLOv9 includes contributions from box loss, class loss, and a new term for gradient preservation (GP loss), ensuring robust training. The loss function is expressed as in Equation 13.

(13)

Where , , are the weights for box loss, class loss, and gradient preservation loss. And represents the number of cells that contain the object. N represents the weight decay. While building upon the successes of previous iterations like YOLOv7 and YOLOv8, YOLOv9 represents a distinct evolution within the YOLO family by addressing information loss issues, maintaining efficiency, and introducing innovative techniques like PGI and GELAN for superior object detection performance.

Super-resolution techniques

Super-resolution techniques are methods employed in image processing to enhance the resolution or quality of an image beyond its original dimensions. Image processing methods and Generative Adversarial Networks (GANs) have significant importance in the development of super-resolution techniques. Bicubic interpolation is a traditional method used for image super-resolution [36]. It is employed in the context of enhancing low-resolution thermal images of the eye to achieve higher-resolution outputs. Traditional methods like bicubic interpolation offer a quick but often blurry solution, while advanced techniques leverage GANs for improved results. These diverse super-resolution techniques cater to providing solutions for enhancing the resolution of thermal images, particularly in eye detection. BSRGAN and ESRGAN [37] are notable GAN-based approaches designed for super-resolution tasks. BSRGAN creates high-resolution thermal eye images even without access to original high-resolution versions. ESRGAN, an improvement over SRGAN, excels in producing visually pleasing and highly detailed high-resolution images, suitable for real-time applications. Real-ESRGAN adopts a unique approach by training purely with synthetic data, enabling it to address a variety of real-world image degradation challenges [38]. Its high-order degradation modeling process provides superior visual performance over diverse datasets. However, the computational demands of training and deploying these models may limit their real-time applicability in certain scenarios. SwinIR and its variant SwinIR-Large utilize transformer-based models to capture long-range dependencies in the data, particularly useful for handling low-resolution thermal images of the eye [39]. By employing stacked Residual Swin Transformer Blocks, these models progressively extract high-level features and complex relationships, achieving high-quality image restoration. SwinIR-Large is a larger and more powerful version of the original model with an increased number of parameters and layers for potentially better performance. Hence requiring more computational resources. ResShift, built on the concept of residual learning, enhances the resolution of low-resolution thermal images by merging information from different network parts while keeping computational demands low [40]. Trained on large datasets, ResShift offers a promising solution for providing clearer and more detailed images of human eyes.

Results and discussion

The thermal camera generates images with a resolution of 640x480 pixels. For creating ground truth data, manual tagging was performed using an annotation tool. Three sets of images are prepared: a training set, a validation set, and a test set. All these images are resized to 640x640 pixels each. Python is employed as the programming language for coding and execution on a laptop equipped with an Intel Core i5 processor, 64-bit operating system, operating frequency 3.20GHz with 8GB memory. References for the coding and execution processes are drawn from GitHub repositories such as “https://github.com/ultralytics/ultralytics.git”, “https://github.com/WongKinYiu/yolov9” “https://github.com/zsyOAOA/ResShift”, “https://github.com/xinntao/ESRGAN”, “https://github.com/WongKinYiu/yolov7”, “https://github.com/JingyunLiang/SwinIR”. The YOLOv5, YOLOv7, YOLOv8, and YOLOv9 models are trained using the pre-trained weights of YOLOv5s.pt, YOLOv7.pt, YOLOv8s.pt, and gelan-c.pt, respectively. Default model settings, including learning rate, activation function, and optimization techniques are utilized and are depicted in Table 2. Additionally, LHE with Bilateral Filtering, and CLAHE are applied to enhance the thermal images.

Performance evaluation metrics

Performance evaluation metrics for detection.

Evaluating the system’s performance is crucial to determining its accuracy and resilience in the context of eye detection in infrared thermal photographs of human faces utilizing YOLO. To evaluate the performance of the created eye localization system, several common assessment metrics are frequently employed.

Precision: Precision measures the proportion of accurately identified eye regions among all regions that were assumed to be eyes. It is determined by dividing the total of true positive and false positive detections by the proportion of true positive (TP) detections which is given in Equation 14. Low numbers of false alarms are indicative of great precision.

(14)

Recall (Sensitivity): Recall represents the proportion of correctly identified eye regions among all ground truth eye regions. It is sometimes referred to as sensitivity or true positive rate (TPR). The ratio of true positive detections to the total of true positive and false negative (FN) detections is used to compute it and is given in Equation 15. Low missed detection rates are indicated by a high recall rate.

(15)

Intersection over Union (IoU): IoU calculates the amount of space where the predicted and actual bounding boxes overlap. Equation 16 defined IoU. It is determined by dividing the intersection’s area by the union’s area between the expected and actual bounding boxes. Better localization accuracy is indicated by higher IoU values.

(16)

Average precision (AP): The area under the precision-recall curve is known as average precision (AP). It lists the model’s effectiveness at each confidence score threshold. The quality of the detections is represented by a single scalar value that AP offers. AP has a value between 0 and 1, with a higher number indicating superior performance.

mAP (mean Average Precision): mAP is the average of the AP scores from several classes or categories. It provides an overall assessment of the model’s object detection accuracy by averaging the AP values for all classes. Different mean Average Precision (mAP) metrics, such as mAP50 and mAP50-95, serve various purposes depending on the task, dataset, and the importance you place on precision and recall. If a balance between precision and recall is needed, mAP50 is a suitable choice. For a comprehensive perspective, mAP50-95 offers insights into performance across different Intersections over Union (IoU) thresholds. Ultimately, selecting the right metric hinges on aligning it with the specific application requirements and goals. mAP50 and mAP50-95 are determined in this research article because mAP50 values show better object detection performance at a 50 percent intersection over union (IoU) threshold because they reflect a higher degree of alignment between predicted bounding boxes and ground truth boxes, and mAP50-95 provides an overview of performance across various IoU thresholds.

Performance evaluation metrics for super-resolution.

Evaluating the super-resolution step on the detected eye regions is also important. Since we lack high-resolution ground truth images for comparison, we rely on different metrics that don’t require a reference image, along with subjective evaluation of the improved images. Although NIQE, BRISQUE, and PIQE metrics were initially developed for natural images, they have been effectively employed in thermal imaging contexts due to their ability to quantify general perceptual artifacts and image degradation independent of image modality [41,42].

NIQE (Natural Image Quality Evaluator): NIQE is a metric designed to assess the quality of natural images without relying on reference images [43]. It quantifies the level of distortion and artifacts present in an image by analyzing statistics related to natural image properties. Lower NIQE scores indicate higher image quality, making it a useful tool for evaluating the perceptual quality of images without the need for reference images.

BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator): BRISQUE is a reference-free image quality assessment metric that evaluates the spatial quality of images. BRISQUE analyzes statistical features extracted from an image and uses a trained model to predict the perceived image quality. Lower BRISQUE scores indicate better image quality.

PIQE (Perceptual Image Quality Evaluator): PIQE is a perceptual quality assessment metric that aims to measure the perceived quality of images by considering various aspects of human visual perception. It takes into account factors such as contrast, sharpness, and color fidelity to provide a comprehensive evaluation of image quality. PIQE is designed to be sensitive to subtle distortions that may impact the perceived quality of an image. Higher PIQE scores suggest lower perceptual quality.

Comparative analysis of detection algorithms

Table 3 summarises the ablation study on clipLimit and titleGridSize. Various YOLO models are executed on the self-created dataset without augmentation, employing different combinations of clipLimit and titleGridSize parameters. Based on the study conducted for CLAHE, the parameters are configured to clipLimit = 2.0 and tileGridSize=(8, 8).

thumbnail
Table 3. Analyzing the Impact of CLAHE parameter Variations on Different YOLO Models on the dataset without Augmentation.

https://doi.org/10.1371/journal.pone.0328227.t003

The YOLO models’ performance is assessed on raw images, locally histogram-equalized with bilaterally filtered images, and CLAHE-enhanced images. Table 4 provides a summary and comparison of precision, recall, mAP50, and mAP50-95 metrics for eye localization in thermal images using different YOLO models.

thumbnail
Table 4. Performance Metrics Comparison for Various Detection Algorithms.

https://doi.org/10.1371/journal.pone.0328227.t004

In general, with augmentation, there is an improvement in precision, recall, and mAP scores for most models, indicating that data augmentation is beneficial for these models. YOLOv8 with augmentation particularly stands out with high precision, recall, and mAP scores. The evaluation of YOLOv5, YOLOv7, YOLOv8, and YOLOv9 models on raw thermal images without augmentation reveals notably diminished overall performance. When augmentation is introduced, there is a notable performance improvement, especially for YOLOv5, YOLOv8, and YOLOv9 leading to higher precision, recall, and mAP50-95. A comprehensive comparison across different image enhancement techniques indicates that CLAHE-enhanced thermal images consistently demonstrate superior performance across all metrics, followed closely by LHE+Bilateral Filtering. These techniques consistently outperform raw thermal images in terms of precision, recall, and mAP scores. Moreover, the findings indicate that utilizing CLAHE-enhanced augmented thermal images with the YOLOv8 model yields superior performance compared to other models, achieving high precision (1), recall (1), mAP-50 (0.995), and mAP50-90 (0.796) values. Following closely, the YOLOv5 and YOLOv9 models. The YOLOv9 model attains precision (0.998), recall (0.998), mAP-50 (0.995), and mAP50-95 (0.753). This enhancement enhances our capability to efficiently and accurately identify eyes in thermal images of human faces.

Fig 6 showcases sample raw input thermal images and CLAHE-enhanced thermal images and their corresponding eye localization outputs using YOLOv5, YOLOv7, YOLOv8, and YOLOv9. The performance metrics analysed are precision, recall, and mAP metrics and the loss functions obtained by various YOLO variants on augmented dataset are summarized in Figs 710 respectively. The loss graph indicates that the YOLOv8 and YOLOv9 models have trained successfully, with both training and validation losses decreasing and stabilizing, which is a sign of good generalization. The bar graph of the comparison between the mAP50-95 metric values produced by several Yolo models using thermal images with and without augmentation is displayed in Fig 11. The graph makes it evident that the YOLOv8 model with augmented images has a better mAP50-95 score.

thumbnail
Fig 6. Sample input image and eye-detected image using different Yolo models.

A comparative analysis of eye detection on thermal face images using different YOLO models. The first row shows raw images without augmentation, while the second row depicts raw images with augmentation. The third-row displays CLAHE-enhanced images without augmentation, and the fourth row illustrates CLAHE-enhanced images with augmentation. The detection results highlight the impact of augmentation and CLAHE enhancement on improving feature visibility and detection accuracy across different YOLO models.

https://doi.org/10.1371/journal.pone.0328227.g006

thumbnail
Fig 7. Performance metrics for raw thermal images.

This graph shows precision, recall, and mAP variations across different YOLO models trained on raw thermal images.

https://doi.org/10.1371/journal.pone.0328227.g007

thumbnail
Fig 8. Performance metrics for CLAHE-enhanced thermal images.

This graph highlights the improvements in detection accuracy due to contrast enhancement using CLAHE, as measured across YOLO variants.

https://doi.org/10.1371/journal.pone.0328227.g008

thumbnail
Fig 9. Loss graph for raw thermal images.

This figure illustrates training and validation loss convergence across different YOLO models on raw thermal images.

https://doi.org/10.1371/journal.pone.0328227.g009

thumbnail
Fig 10. Loss graph for CLAHE-enhanced thermal images.

This graph demonstrates the effect of CLAHE preprocessing on model stability and convergence during training.

https://doi.org/10.1371/journal.pone.0328227.g010

thumbnail
Fig 11. Comparison of mAP50-95 value.

Compares mAP50-95 across YOLO variants on raw, CLAHE-enhanced, and LHE + Bilateral filtered thermal images, with and without augmentation. Enhancements and augmentation improve detection accuracy, with YOLOv8 and YOLOv9 achieving the best performance.

https://doi.org/10.1371/journal.pone.0328227.g011

Different YOLOv8 models, such as YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x, cater to varying requirements. Selecting the most suitable model depends on the particular requirements. If speed is a critical factor, yolov8n or yolov8s is considered. For scenarios where a good balance between speed and accuracy is needed, yolov8m is a good option. If achieving the highest level of accuracy is the top priority, yolov8l or yolov8x would be the best choices, although it’s important to note that they come with slower inference speeds. Table 5 provides an analysis of different pre-trained YOLOv8 models (yolov8n, yolov8s, yolov8m, yolov8l) on images enhanced with CLAHE both with and without augmentation.

The metrics such as precision, recall, mAP50, and mAP50-95 indicate that the Yolov8m model performs exceptionally well with augmentation, boasting precision = 1, recall = 1, mAP50 = 0.995, and mAP50-95 = 0.801. Yolov8l closely follows yolov8m, demonstrating comparable performance across all parameters, where yolov8m excels. Other models, such as yolov8n and yolov8s also exhibit good performance, but yolov8m stands out for its superior accuracy. Fig 12 illustrates the comparison of mAP50-95 for CLAHE-enhanced images with and without augmentation. Notably, CLAHE-enhanced images with the yolov8m model achieve the highest accuracy in detecting eyes in thermal images.

thumbnail
Fig 12. Performance analysis of different weights of YOLOv8 on CLAHE-enhanced images.

Compares mAP50-95 for different YOLOv8 weights (n, s, m, l) on CLAHE-enhanced images, with and without augmentation. Augmentation improves accuracy, with YOLOv8m and YOLOv8l performing best.

https://doi.org/10.1371/journal.pone.0328227.g012

Each YOLO model has specific strengths and weaknesses: YOLOv5 is highly precise and performs well with data augmentation, suitable for applications where high precision is critical, but struggles with recall on raw images. YOLOv7 offers balanced performance with high recall, making it reliable for diverse scenarios, though less effective with certain enhancements. YOLOv8 provides strong overall performance and excels with CLAHE-enhanced images, making it versatile but potentially requiring more computational resources. YOLOv9 demonstrates excellent precision and recall with specific enhancements and performs well with augmentation, but its recall on raw images is less robust.

Table 6 provides a clear summary of the descriptive statistics for Precision, Recall, mAP50, and mAP50-95 for various YOLO models, specifically focusing on augmented images. YOLOv8 exhibits the highest Precision and Recall with minimal variability, while YOLOv5 also shows high Precision and Recall but with less consistency compared to YOLOv8. YOLOv7 has the lowest Precision and the highest variability, whereas YOLOv9 displays substantial variability in both Precision and Recall despite having comparable recall values to YOLOv5 and YOLOv8. For mAP50 and mAP50-95, all models perform consistently with identical mean values and very low standard deviations, indicating stable performance across models for these metrics. Hence, our study revealed that YOLOv8 shines in our task requiring both speed and accuracy, making it ideal for real-time applications. This is due to its well-balanced architecture and features like pixel-level segmentation which are particularly useful for detecting small objects like eyes. While YOLOv9 boasts advancements in accuracy through features like PGI and GELAN, YOLOv8’s focus on speed and precision makes it a better choice for our specific task of detecting human eyes in low-resolution thermal images. The precision-recall curve analysis further supports this, which shows YOLOv8 achieving a higher proportion of true positives while minimizing false positives compared to YOLOv9 on our thermal image dataset.

thumbnail
Table 6. Descriptive Statistics for the performance of different YOLO models.

https://doi.org/10.1371/journal.pone.0328227.t006

The superior performance of the CLAHE and YOLOv8 combination is due to the synergy between enhanced contrast and an advanced detection architecture. CLAHE improves local contrast and enhances edge definition in thermal eye images, which are inherently low in texture and contain subtle intensity variations. This preprocessing step enhances feature visibility. YOLOv8, with its anchor-free detection head, decoupled branches, and CSP-based backbone, effectively detects these refined features, unlike earlier YOLO versions that rely on anchor-based detection, which struggles with low-texture data. In comparison, methods like HE or gamma correction often lose local details. This combination consistently yields higher precision, recall, and mAP than other approaches.

Comparing super-resolution algorithms through non-reference metrics

After obtaining the eye-detected output from the trained and validated YOLO models, the detected bounding box coordinates are used to crop the detected eye regions for further analysis. The ROI extraction is shown in Fig 13. However, the extracted eye regions often suffer from low resolution. Implementing super-resolution on the whole thermal image escalates its complexity. So, we only apply super-resolution to the cropped area around the eyes (ROI) which significantly reduces complexity while improving the quality of the specific area we care about. Hence to address this issue and improve the resolution, various super-resolution algorithms are applied, including bicubic interpolation, BSRGAN, ESRGAN, real-ESRGAN, swinIR, swinIR-Large, and ResShift. In our study, we employed various super-resolution techniques to enhance the resolution of the cropped eye region from thermal face images by a factor of x4. All methods utilized the authors’ pre-trained models without any modification to hyperparameters, which is shown in Table 7. Each technique adhered to default settings and configurations, ensuring a consistent evaluation framework across all methods. This approach allowed us to leverage established model architectures and training regimes for a robust comparison of super-resolution performance. Table 8 presents the performance analysis of these different super-resolution algorithms, evaluated using non-reference metrics such as NIQE, BRISQUE, and PIQE.

thumbnail
Table 7. Hyperparameters Used by Different Super-Resolution Models.

https://doi.org/10.1371/journal.pone.0328227.t007

thumbnail
Table 8. Performance Analysis of Super-resolution Algorithms.

https://doi.org/10.1371/journal.pone.0328227.t008

thumbnail
Fig 13. ROI extraction.

Illustrates the Region of Interest (ROI) extraction process for thermal eye detection. The left image shows the detected eye region in a thermal face image. The center diagram represents the bounding box expansion around the detected eyes, ensuring optimal coverage for feature extraction. The right image displays the final extracted eye ROIs, which are used for further analysis and processing.

https://doi.org/10.1371/journal.pone.0328227.g013

The performance of each image resolution enhancement technique is assessed by comparing the values across NIQE, BRISQUE, and PIQE metrics, with lower values generally indicating better image quality. Fig 14 illustrates the scatter plot of the non-reference metrics for a few samples of raw images and the enhanced images. It clearly illustrates bicubic interpolation has elevated scores suggesting limitations in achieving substantial improvements. BSRGAN demonstrates competitive performance across various image types, exhibiting minor variations in scores. In assessing image enhancement techniques based on naturalness and spatial quality metrics such as NIQE and BRISQUE, ESRGAN consistently demonstrates strong performance with consistently low scores. This suggests that ESRGAN excels in maintaining the natural appearance of images and preserving spatial details effectively. On the other hand, when prioritizing perceptual quality, as measured by PIQE, ESRGAN again excels and the Real-ESRGAN, SwinIR, SwinIR-Large, and ResShift emerge as competitive options. Figs 15 and 16 visualize the output of different super-resolution algorithms.

thumbnail
Fig 14. Scatter Plot of non-reference metrics for a few sample images.

Compares the non-reference image quality metrics (NIQE, BRISQUE, and PIQE) for different super-resolution techniques on raw and CLAHE-enhanced images. The left plot represents raw images, while the right plot shows CLAHE-enhanced images. The results highlight the impact of CLAHE processing on improving image quality and the effectiveness of deep learning-based super-resolution methods over traditional interpolation techniques.

https://doi.org/10.1371/journal.pone.0328227.g014

thumbnail
Fig 15. Sample raw image vs. Super-Resolution Results.

Compares a raw thermal eye image with outputs from different super-resolution techniques, highlighting improvements in clarity, edge sharpness, and thermal feature preservation for better analysis.

https://doi.org/10.1371/journal.pone.0328227.g015

thumbnail
Fig 16. Sample CLAHE enhanced image vs. Super-Resolution Results.

Presents a CLAHE-enhanced thermal eye image (top-left) alongside outputs from various super-resolution techniques. The results highlight the combined effect of contrast enhancement and high-resolution reconstruction, aiding in more precise thermal image analysis.

https://doi.org/10.1371/journal.pone.0328227.g016

Subjective assessment of images enhanced through super-resolution

Non-reference metrics like PIQE, BRISQUE, and NIQE prioritize statistical features over human visual preferences and context-specific factors. Consequently, these metrics do not fully capture the intricacies of human visual perception and preferences, leading to potential discrepancies between the metric scores and human perceptual judgments [44]. Objective and subjective quality assessment methods don’t always exhibit a strong correlation with each other [45]. Therefore, subjective assessment of images enhanced through super-resolution is essential. It helps evaluate perceived improvements in quality and provides insights into the effectiveness of different algorithms and their impact on visual perception. This aspect cannot be fully captured by objective metrics. In our study, subjective image assessment is conducted using the single stimulus method, wherein images are presented individually to participants who rated their quality before moving on to the next image [46]. This method, employs the Absolute Category Rating scale in which the subject is bound to grade the image quality on a scale of five points, which are: bad, poor, fair, good, and excellent [47]. Table 9 shows the scale of image quality grade used and the score assigned to each. It simplifies the assessment process but may prolong testing time with a large image set. Factors such as image content influence participants’ opinions, and mean opinion scores (MOS) are calculated based on their feedback [48]. The formula for calculating MOS is expressed in Equation 17.

(17)

Where represents the sum of all individual scores provided by subjects. calculates the average by dividing the sum by the total number of subjects.

In compliance with ITU-R BT.500−12 recommendations [49], we conduct tests in a controlled environment using a 4K resolution monitor. Our participant pool consists mostly of research students who are familiar with multimedia applications and image quality, totaling 20 individuals. They received a briefing on the test procedure before evaluating a total of 15 images, with 5 from each category: raw thermal extracted eye images, CLAHE enhanced eye images, and LHE with bilateral filtering eye images. Each participant repeated the scoring five times, and the MOS was calculated based on these repeated assessments. In light of the display size, participants are urged to perform the exam in a leisurely manner without regard to time limitations and are permitted to sit at their desired comfortable viewing distance. We record participants’ scores for different super-resolution algorithms applied to the test images, which are used to calculate the MOS for each technique. The consistent trends observed across repetitions ensured the reliability of the subjective evaluation. Fig 17 depicts sample images along with their corresponding calculated MOS values. Our findings indicate comparable performance among BSRGAN, Real-ESRGAN, SwinIR, SwinIR-Large, and ResShift, surpassing that of ESRGAN. The summary of MOS calculations is presented in Table 10, with a pie chart in Fig 18 illustrating the mean MOS values. SwinIR-Large demonstrates strong performance, while other techniques such as BSRGAN, Real-ESRGAN, SwinIR, and ResShift also perform well with slight variations.

thumbnail
Fig 17. Sample Images with MOS value.

Sample images with their assigned MOS based on a five-point scale, assessing quality in terms of sharpness, clarity, and feature preservation. The ratings offer an objective comparison of super-resolution performance across different methods.

https://doi.org/10.1371/journal.pone.0328227.g017

thumbnail
Fig 18. Pie chart illustrating the comparison of Mean Opinion Scores.

Compares MOS for super-resolution techniques on raw and CLAHE-enhanced images. SwinIR-Large achieves the highest MOS (18%), indicating superior perceptual quality, while Bicubic interpolation scores the lowest. CLAHE enhancement improves MOS for most methods, with BSRGAN, Real-ESRGAN, and SwinIR showing notable gains.

https://doi.org/10.1371/journal.pone.0328227.g018

Fig 19 illustrates the complete workflow from the initial raw thermal image through the stages of contrast enhancement, eye region detection, extraction, and super-resolution processing. The choice of a super-resolution algorithm involves balancing image quality against computational complexity. Traditional methods like Bicubic Interpolation offer quick and resource-efficient solutions at the expense of finer details, while advanced techniques like ESRGAN, Real-ESRGAN, and SwinIR provide superior image quality with higher computational costs. While ResShift is less demanding than GAN-based methods, it still requires considerable computational resources due to its deep network architecture. Understanding these trade-offs is crucial for selecting the appropriate method based on specific application requirements and available computational resources.

thumbnail
Fig 19. Complete Flow of Results of Thermal Image Processing for Human Eye Detection and Enhancement.

Illustrates the workflow of thermal eye detection and super-resolution enhancement. The left section shows input thermal images and ROI extraction for the left and right eyes, while the right section compares super-resolution results using various techniques. Deep learning-based methods significantly improve clarity and feature preservation over traditional interpolation.

https://doi.org/10.1371/journal.pone.0328227.g019

Practical Implication and Potential Application

The proposed technology for accurate eye detection in thermal images has significant potential in various real-world applications. In biometrics, it enhances facial and iris recognition systems, improving security in low-light conditions. In healthcare, it facilitates patient monitoring, ophthalmology, and telemedicine by accurately tracking vital signs, ocular temperature, and stress levels through thermal imaging. For driver monitoring, it enables the detection of drowsiness and distraction, thereby enhancing road safety. In human-computer interaction, precise eye detection can be used for gaze tracking and emotion recognition, improving user experience and accessibility. This technology offers robust solutions across multiple fields where traditional imaging methods fall short due to poor lighting or visibility conditions. Implementing the proposed eye detection and enhancement pipeline in real-time systems like driver monitoring or biometric authentication is feasible but challenging. It requires high-performance hardware (e.g., GPUs, AI accelerators) and optimization techniques (e.g., model pruning, quantization, parallel processing) to handle the computational load and reduce latency. Additionally, integrating quality thermal cameras and ensuring robust performance under varying conditions is crucial. Reliable and efficient deployment is achievable with appropriate hardware, optimization, and extensive real-world testing.

While these technological advancements are significant, it is crucial to address the ethical considerations associated with the use of eye detection technology in sensitive thermal applications. To protect privacy, informed consent, anonymization, and data minimization are essential. Transparent communication and voluntary participation ensure proper consent. Data security should be maintained and to avoid bias, diverse datasets are necessary. The potential misuse of this technology for increased surveillance or discriminatory practices must be mitigated through ethical guidelines. Addressing these ethical considerations is essential for the responsible and ethical use of eye detection technology in thermal applications.

Limitations and Future Improvement

Super-resolution techniques are effective but can introduce artifacts. Evaluation using non-reference metrics and subjective mean opinion scores may lack objectivity, and pre-trained models may carry inherent biases. While PIQE, BRISQUE, and NIQE provide insights into thermal image quality, they were designed for natural images. Future work can develop task-specific non-reference image quality assessment metrics methods for more accurate super-resolution assessment in thermal imaging. Deep learning-based methods are computationally demanding, limiting their suitability for real-time applications. Improvements could include adaptive pre-processing to enhance thermal image quality, fine-tuning detection models with a more diverse range of datasets, developing hybrid super-resolution methods that optimize both accuracy and computational efficiency, and conducting extensive real-world testing. The thermal images collected provide temperature data for the detected eye region, but a more diverse dataset—capturing variations in eye conditions, geographic demographics, race, and age groups—is needed. This would support applications in healthcare and ophthalmology. Such advancements can improve the robustness, efficiency, and fairness of the approach, thereby expanding its potential uses.

Conclusion

This study presents an innovative approach that revolutionizes eye detection and resolution enhancement in thermal facial images by integrating deep learning (particularly YOLOv8 and YOLOv9), image enhancement (CLAHE), and super-resolution techniques. This unique blend achieves unmatched accuracy (precision and recall of 1, mAP50 of 0.995) even in challenging low-resolution and low-contrast thermal data. The key contributions include the development of a pioneering pipeline seamlessly combining CLAHE pre-processing, YOLO-based eye detection, and super-resolution techniques. Also, a unique dataset of thermal facial images with meticulously labeled eye locations is introduced. While YOLOv8 currently excels, further exploration of YOLOv9’s potential through ablation studies is warranted. Future research directions include exploring multimodal biometric systems, advanced deep learning architectures, continuous learning strategies, and robust super-resolution algorithms tailored for thermal images, aiming to further enhance eye detection capabilities and unlock its full potential in various domains.

Supporting information

S1 File. This file contains Supplementary Figures S1-S25.

https://doi.org/10.1371/journal.pone.0328227.s001

(ZIP)

Acknowledgments

The authors would like to thank the Vellore Institute of Technology, Chennai, for providing the necessary facilities and support to conduct this research and all volunteers who participated in the data collection process.

References

  1. 1. Farooq MA, Shariff W, O’callaghan D, Merla A, Corcoran P. On the role of thermal imaging in automotive applications: a critical review. IEEE Access. 2023;11:25152–73.
  2. 2. Kesztyüs D, Brucher S, Wilson C, Kesztyüs T. Use of infrared thermography in medical diagnosis, screening, and disease monitoring: a scoping review. Med. 2023;59(12):2139. pmid:38138242; PubMed Central PMCID: PMC10744680.
  3. 3. Persiya J, Sasithradevi A. Thermal mapping the eye: a critical review of advances in infrared imaging for disease detection. J Therm Biol. 2024;121:103867. pmid:38744026
  4. 4. Persiya. J, Sasithradevi A, Roomi SMM. Infrared Thermograms for Diagnosis of Dry Eye: A Review. In: 2023 International Conference on Bio Signals, Images, and Instrumentation (ICBSII), 2023. 1–6. https://doi.org/10.1109/icbsii58188.2023.10181092
  5. 5. Lu Y, Lu G. An Alternative of LiDAR in Nighttime: Unsupervised Depth Estimation Based on Single Thermal Image. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021. 3832–42. https://doi.org/10.1109/wacv48630.2021.00388
  6. 6. Manullang MCT, Lin Y-H, Lai S-J, Chou N-K. Implementation of thermal camera for non-contact physiological measurement: a systematic review. Sensors (Basel). 2021;21(23):7777. pmid:34883780; PubMed Central PMCID: PMC8659982.
  7. 7. Persiya J, Sasithradevi A, Mohamed Mansoor Roomi S. Infrared thermography in diagnosing macular edema. Data Analytics for Intelligent Systems. IOP Publishing. 2024. p. 8–16. https://doi.org/10.1088/978-0-7503-5417-2ch8
  8. 8. Picking a Thermal Color Palette [Internet]. 2021 [cited 2024 Aug 2]. Available from: https://www.flir.com/discover/industrial/picking-a-thermal-color-palette/
  9. 9. Dulski R, Powalisz P, Kastek M, Trzaskawka P. Enhancing image quality produced by IR cameras. Electro-Optical Infrared Syst Technol Appl VII [Internet]. 2010;7834(May):783415.
  10. 10. Madura Meenakshi R, Padmapriya N, Venkateswaran N, Ravikumar R, Chelliah R. Localization of eye region in infrared thermal images using deep neural network. 2021 Int Conf Wirel Commun Signal Process Networking, WiSPNET 2021 [Internet]. 2021;446–50. https://doi.org/10.1109/WiSPNET51692.2021.9419446
  11. 11. Ilikci B, Chen L, Cho H, Liu Q. Heat-Map Based Emotion and Face Recognition from Thermal Images. In: 2019 Computing, Communications and IoT Applications (ComComAp), 2019. 449–53. https://doi.org/10.1109/comcomap46287.2019.9018786
  12. 12. Ghourabi M, Mourad-Chehade F, Chkeir A. Eyes Recognition for Inner Canthus Temperature Detection using YOLOv5 Algorithm. In: 2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 2022. 1–5. https://doi.org/10.1109/cisp-bmei56279.2022.9980195
  13. 13. Ghourabi M, Mourad-Chehade F, Chkeir A. Eye recognition by YOLO for inner canthus temperature detection in the elderly using a transfer learning approach. Sensors. 2023;23(4):1851. pmid:36850447; PubMed Central PMCID: PMC9964838.
  14. 14. Klaib AF, Alsrehin NO, Melhem WY, Bashtawi HO, Magableh AA. Eye tracking algorithms, techniques, tools, and applications with an emphasis on machine learning and internet of things technologies. Expert Syst Appl. 2021;166:114037.
  15. 15. Budzan S, Wyżgolik R. Face and eyes localization algorithm in thermal images for temperature measurement of the inner canthus of the eyes. Infrared Phys Technol. 2013;60:225–34. pmid:32288545; PubMed Central PMCID: PMC7110491.
  16. 16. Knapik M, Cyganek B. Fast eyes detection in thermal images. Multimed Tools Appl. 2020;80(3):3601–21.
  17. 17. Zhou Z, Fang Z, Wang J, Chen J, Li H, Han L, et al. Driver vigilance detection based on deep learning with fused thermal image information for public transportation. Eng Appl Artif Intell. 2023;124:106604.
  18. 18. Maiti S, Gupta A. Local eye-net: an attention based deep learning architecture for localization of eyes. Expert Systems with Applications. 2024;239:122416.
  19. 19. Mostafa E, Hammoud R, Ali A, Farag A. Face recognition in low resolution thermal images. Comput Vis Image Underst. 2013;117(12):1689–94.
  20. 20. Goulart C, Valadão C, Delisle-Rodriguez D, Funayama D, Favarato A, Baldo G, et al. Visual and thermal image processing for facial specific landmark detection to infer emotions in a child-robot interaction. Sensors (Switzerland). 2019;19(13):2844. pmid:31248004; PubMed Central PMCID: PMC6650968.
  21. 21. Zhao Z, Zhang Y, Li C, Xiao Y, Tang J. Thermal UAV image super-resolution guided by multiple visible cues. IEEE Trans Geosci Remote Sens. 2023;61:1–14.
  22. 22. Batchuluun G, Kang JK, Nguyen DT, Pham TD, Arsalan M, Park KR. Deep learning-based thermal image reconstruction and object detection. IEEE Access. 2021;9:5951–71.
  23. 23. Suárez PL, Carpio D, Sappa AD. Enhancement of guided thermal image super-resolution approaches. Neurocomputing. 2024;573:127197.
  24. 24. Lu Y, Lu G. SuperThermal: matching thermal as visible through thermal feature exploration. IEEE Robot Autom Lett. 2021;6(2):2690–7.
  25. 25. Zhao K, Yuan K, Sun M, Li M, Wen X. Quality-aware Pretrained Models for Blind Image Quality Assessment. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 22302–13. https://doi.org/10.1109/cvpr52729.2023.02136
  26. 26. Guo Q, Wen J. Multi-level Fusion Based Deep Convolutional Network for Image Quality Assessment. In: Del Bimbo A, Cucchiara R, Sclaroff S, Farinella GM, Mei T, Bertini M, et al., editors. BT - Pattern Recognition ICPR International Workshops and Challenges ICPR 2021 Lecture Notes in Computer Science() [Internet]. Cham: Springer International Publishing; 2021. p. 670–8. https://doi.org/10.1007/978-3-030-68780-9_51
  27. 27. Lv H, Shan P, Shi H, Zhao L. An adaptive bilateral filtering method based on improved convolution kernel used for infrared image enhancement. SIViP. 2022;16(8):2231–7.
  28. 28. Lee J, Pant SR, Lee H-S. An adaptive histogram equalization based local technique for contrast preserving image enhancement. Int J Fuzzy Log Intell Syst. 2015;15(1):35–44.
  29. 29. Lazarevich I, Grimaldi M, Kumar R, Mitra S, Khan S, Sah S. YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems. 2023. http://arxiv.org/abs/2307.13901
  30. 30. Wang CY, Mark Liao HY, Wu YH, Chen PY, Hsieh JW, Yeh IH. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In: IEEE Comput Soc Conf Comput Vis Pattern Recognit Work, 2020. 1571–80. http://arxiv.org/abs/1911.11929
  31. 31. Wang CY, Bochkovskiy A, Liao HYM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. 2022;1–15. Available from: http://arxiv.org/abs/2207.02696
  32. 32. Liu S, Qi L, Qin H, Shi J, Jia J. Path Aggregation Network for Instance Segmentation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit [Internet]. 2018;8759–68.
  33. 33. Reis D, Kupec J, Hong J, Daoudi A. Real-Time Flying Object Detection with YOLOv8. 2023; Available from: http://arxiv.org/abs/2305.09972
  34. 34. Terven J, Cordova-Esparza D. A comprehensive review of YOLO: From YOLOv1 and beyond. 2023;1–33. http://arxiv.org/abs/2304.00501
  35. 35. Wang CY, Yeh IH, Liao HYM. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. 2024; Available from: http://arxiv.org/abs/2402.13616
  36. 36. Liu J, Gan Z, Zhu X. Directional Bicubic Interpolation — A New Method of Image Super-Resolution. Proc 3rd Int Conf Multimed Technol [Internet], 2013;84.
  37. 37. Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, et al. ESRGAN: enhanced super-resolution generative adversarial networks. Lect Notes Comput Sci. 2019;11133 LNCS:63–79. https://doi.org/10.1007/978-3-030-11021-5_5
  38. 38. Wang X, Xie L, Dong C, Shan Y. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. Proc IEEE Int Conf Comput Vis [Internet]. 2021-Octob:1905–14.
  39. 39. Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R. SwinIR: Image Restoration Using Swin Transformer. Proc IEEE Int Conf Comput Vis [Internet]. 2021;1833–44.
  40. 40. Yue Z, Wang J, Loy CC. ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting. 2023;(NeurIPS):1–19. Available from: http://arxiv.org/abs/2307.12348
  41. 41. Jiménez-Gaona Y, Carrión-Figueroa D, Lakshminarayanan V, José Rodríguez-Álvarez M. Gan-based data augmentation to improve breast ultrasound and mammography mass classification. Biomed Signal Process Control. 2024;94:106255.
  42. 42. Laidouni MZ, Bondžulić BP, Bujaković DM, Petrović VS, Adli T, Andrić MS. Bimodal and trimodal image fusion: a study of subjective scores and objective measures. J Electr Eng. 2025;76(1):7–17.
  43. 43. Mittal A, Soundararajan R, Bovik AC. Making a “Completely Blind” image quality analyzer. IEEE Signal Process Lett. 2013;20(3):209–12.
  44. 44. Zvezdakova A, Kulikov D, Kondranin D, Vatolin D. Barriers towards no-reference metrics application to compressed video quality analysis: on the example of no-reference metric NIQE. CEUR Workshop Proc [Internet]. 2019;2485:22–7.
  45. 45. Voznesensky AS, Sinitca AM, Shalugin ED, Antonov SA, Kaplun DI. No-Reference metrics for images quality estimation in a face recognition task. Lect Notes Networks Syst [Internet]. 2023;702 LNNS(June):462–74.
  46. 46. Mohammadi P, Ebrahimi-Moghadam A, Shirani S. Subjective and Objective Quality Assessment of Image: A Survey. 2014;(June). Available from: http://arxiv.org/abs/1406.7799
  47. 47. Zhang H, Li D, Yu Y, Guo N. Subjective and objective quality assessments of display products. Entropy. 2021;23(7):814. pmid:34206721; PubMed Central PMCID: PMC8306303.
  48. 48. Hu B, Li L, Wu J, Qian J. Subjective and objective quality assessment for image restoration: a critical survey. Signal Process Image Commun. 2020;85:115839.
  49. 49. Bt I r, Bt I r, Itu-r Q, Itu-r Q, Itu T, Itu T, et al. Recommendation ITU-R BT.500-11 Methodology for the subjective assessment of the quality of television pictures. Methodology [Internet]. 2002;12:1–48. Available from: https://www.itu.int/rec/R-REC-BT.500