Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Improving object detection in challenging weather for autonomous driving via adversarial image translation

  • Kunyi Wang,

    Roles Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft

    Affiliation University of British Columbia, Vancouver, BC, Canada

  • Yaohua Zhao

    Roles Conceptualization, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    zhaoyaohua2025@163.com

    Affiliation School of Trafic & Transportation Engineering, Central South University, Changsha, China

Abstract

Vision-based environmental perception is fundamental to autonomous driving, as it enables reliable detection and recognition of diverse objects in complex traffic environments. However, adverse weather conditions (such as rain, fog, and low-light conditions) significantly degrade image quality, thereby undermining the reliability of object detection algorithms. To address this challenge, we propose a two-stage framework designed to enhance object detection under adverse conditions. In the first stage, we design a lightweight Pix2Pix-based generative adversarial network (LP-GAN) that translates adverse-weather images into clear-weather counterparts, thereby alleviating visual degradation. In the second stage, the translated images are processed by a state-of-the-art object detector (YOLOv8) to enhance robustness and accuracy. Extensive experiments on the CARLA simulator demonstrate that the proposed framework substantially improves detection performance across diverse adverse conditions. Furthermore, the generated clear-weather images provide faithful and interpretable visual representations, which can facilitate human understanding and decision-making in autonomous driving. Overall, the proposed framework offers a practical and effective solution for weather-robust object detection, thereby contributing to safer and more reliable autonomous driving.

1. Introduction

In recent years, autonomous driving has emerged as a transformative paradigm that is reshaping the future of transportation, with the potential to deliver unprecedented advances in safety, efficiency, and accessibility [13]. At the core of this paradigm is a holistic architecture encompassing perception, planning, decision-making, and control. Among these, perception serves as the cornerstone, enabling the vehicle to construct a real-time and structured representation of its complex, dynamic surroundings [4]. Vision-based environmental perception [58], in particular, has become indispensable due to its rich semantic information and cost-effectiveness, underpinning a broad spectrum of downstream tasks. Central to this perceptual pipeline is object detection, a critical capability that facilitates the identification and localization of surrounding agents and infrastructure elements. As the principal mechanism by which autonomous systems interpret the visual world, object detection transcends a functional role and constitutes a safety-critical technology. It ensures collision avoidance, trajectory planning, and regulation-compliant navigation across diverse traffic scenarios [9]. Therefore, its reliability is fundamental to large-scale deployment and societal acceptance of autonomous driving.

Adverse weather remains one of the most critical barriers to the safe deployment of autonomous vehicles. Numerous real-world incidents have underscored the vulnerability of perception systems when confronted with degraded visibility [1012]. For example, heavy rain can create specular reflections and water spray that obscure lane markings, while dense fog reduces contrast and blurs distant objects, and nighttime illumination often results in sensor noise and incomplete detection of pedestrians or obstacles. Such environmental factors directly impair core perception modules (i.e., object detection, lane keeping, and semantic segmentation), leading to cascading failures in planning and control. While state-of-the-art vision-based models achieve near-human performance under clear-weather conditions, their accuracy deteriorates sharply in adverse environments, raising serious concerns about the reliability of autonomous systems in safety-critical scenarios. Motivated by these challenges, this work focuses on enhancing object detection robustness under adverse weather.

Driven by the extraordinary success of deep learning [13], object detection has experienced significant evolution over the past decade, transitioning from early region-based approaches to highly efficient, end-to-end trainable frameworks. Notable milestones include the introduction of the two-stage Region-based Convolutional Neural Network (R-CNN) family [14], which markedly advanced detection accuracy. This was followed by the development of one-stage detectors such as the You Only Look Once (YOLO) series [15] and the Single Shot MultiBox Detector (SSD) [16], which attained real-time performance while maintaining competitive accuracy. Recent advances, including YOLOv5, YOLOv8 and Faster R-CNN with feature pyramid networks (FPN) [17], have further elevated the performance of object detectors under standard conditions, making them integral components of modern autonomous driving perception stacks. Despite their impressive capabilities, these models are predominantly trained and evaluated on datasets collected under clear-weather, daylight conditions, which limits their robustness in real-world environments characterized by adverse weather and low-illumination scenarios. Rain, fog, snow, and nighttime lighting can severely degrade image quality, introduce visual artifacts, and obscure critical scene elements, resulting in pronounced degradation of detection accuracy [18,19]. This vulnerability poses a significant challenge to the deployment of autonomous vehicles in all-weather, all-day conditions and underscores the urgent need for perception systems that are resilient to environmental variability.

To address the vulnerability of object detection models under adverse environmental conditions, we propose a novel two-stage framework designed to enhance robust perception in autonomous driving scenarios. In the first stage, a lightweight Pix2Pix-based generative adversarial network (LP-GAN) is employed for image-to-image translation, reconstructing clear-weather visual features from degraded inputs captured under rain, fog, and nighttime scenarios. In the second stage, the translated images are processed by YOLOv5, yielding substantial gains in detection accuracy and robustness across diverse challenging conditions. Extensive experiments in the CARLA simulator demonstrate the efficacy of the proposed method, underscoring its potential to enhance machine perception while simultaneously providing interpretable visualizations that support human understanding and decision-making. By integrating weather-invariant image restoration with high-performance detection, our framework offers a practical and effective pathway toward achieving resilient, all-weather perception for safe and reliable autonomous driving.

2. Related works

Adverse weather conditions, such as rain, fog, and low-light environments, pose significant challenges to vision-based object detection systems deployed in autonomous vehicles. At the physical level, such conditions substantially impair image fidelity by introducing diverse and scene-dependent visual distortions. For instance, rain introduces high-frequency noise and occlusions through raindrops and streaks on the lens, while fog and haze lead to low-contrast imagery due to light scattering and absorption, which can be modeled using atmospheric scattering models [20,21]. These sensor-level degradations exert a detrimental influence on the reliability and accuracy of subsequent perception algorithms. At the algorithmic level, adverse weather often blurs object contours, diminishes texture and edge fidelity, and weakens semantic cues, thereby compromising feature representation quality and lowering detection confidence [18]. Recent empirical studies have quantified these effects. For example, Mușat et al. [11] reported that heavy rainfall can cause the mean Average Precision (mAP) of state-of-the-art object detectors to drop by more than 30% in urban driving scenes, highlighting the urgent need for models that can generalize beyond ideal visual conditions. These findings underscore the critical importance of addressing weather-induced perception degradation as a prerequisite for achieving reliable and safe autonomous navigation in real-world environments. Research on autonomous driving under adverse weather can be broadly categorized into three directions: enhancing detector robustness, restoring or enhancing degraded images, and integrating restoration with perception tasks.

To mitigate the detrimental effects of adverse weather on object detection, a substantial body of research has focused on enhancing the robustness of detection architectures through direct model-level improvements. Several studies have introduced architectural enhancements to widely adopted detectors such as YOLO and Faster R-CNN, integrating weather-aware components and adaptive attention mechanisms. For example, Chen et al. [22] introduced a domain attention module into Faster R-CNN to dynamically adjust feature extraction based on environmental cues, while Ding et al. [23] designed a cross fusion YOLO (CF-YOLO) with a plug-and-play feature aggregation module to handle unfavorable detection problems of vagueness, distortion and covering of snow. Although these models show improved resilience under known weather conditions, their performance often degrades significantly when encountering unseen or compound weather effects. This limitation arises primarily from their reliance on large-scale, weather-diverse training datasets, the acquisition of which is both resource-intensive and logistically demanding.

To address generalization limitations, recent efforts have turned to multi-task learning and domain adaptation techniques to extract weather-invariant representations. Domain adaptive detectors, such as DA-Faster R-CNN [24] and GPA R-CNN [25] employ adversarial training strategies to align feature distributions between clear and adverse weather domains. Other works such as AOD-net [26] and TogetherNet [27] integrate auxiliary tasks such as image enhancement or semantic segmentation to improve robustness through joint optimization. Fang et al. [28] propose an end-to-end architecture that integrates dehazing with detection. The framework employs an attention fusion module and a self-supervised haze-robust loss, optimized via interval iterative training, and achieves superior detection performance in real-world hazy scenes on benchmarks such as RTTS and VOChaze. However, these approaches often incur substantial computational overhead due to additional loss terms and network branches, and they fail to recover interpretable, human-readable clear-weather imagery, which limits their utility in transparency-critical applications like autonomous driving.

In contrast to model-only robustness strategies, a growing line of research integrates image restoration or enhancement into the detection pipeline to jointly address degradation and recognition under adverse weather. Traditional approaches to adverse weather image enhancement and restoration are primarily based on physical models that explicitly characterize the underlying image degradation mechanisms. The atmospheric scattering model [29,20] forms the theoretical foundation for most dehazing algorithms, characterizing the observed hazy image as a combination of direct attenuation and airlight. He et al.’s seminal work on the Dark Channel Prior (DCP) [30] exploits statistical priors on outdoor haze-free images to estimate transmission maps and recover clear scenes, marking a milestone in single-image dehazing. Similarly, Retinex theory-based methods [31,32] have been widely adopted for low-light enhancement by decomposing images into illumination and reflectance components to correct uneven lighting. However, these physics-based techniques often rely on strong assumptions, such as homogeneous haze distribution or Lambertian surfaces that are violated in dynamic real-world scenarios, leading to suboptimal results under complex weather conditions or heterogeneous illumination. In contrast, recent advances in deep learning, particularly generative adversarial networks (GANs) [33], have demonstrated remarkable advantages for weather-related image restoration. Despite these advances, existing approaches still exhibit several limitations. Physics-based methods rely on strong assumptions and often fail in complex, real-world weather conditions, while GAN-based restoration methods have mainly focused on producing visually plausible images without explicitly evaluating or improving downstream perception tasks such as object detection. To the best of our knowledge, no prior work has proposed a GAN-based image translation framework that simultaneously enhances visual quality and directly improves perception performance under diverse adverse weather conditions. This gap motivates the development of our LP-GAN, which integrates restoration and perception to address both visual degradation and detection challenges in a unified framework.

3. Method

This study aims to improve the robustness and accuracy of object detection for autonomous vehicles under adverse weather conditions, while providing interpretable visualizations to support human interpretability and decision-making processes. To this end, a deep learning-based object detection model is presented in Section 3.1, followed by the development of our LP-GAN framework in Section 3.2, which performs image-to-image translation to enable robust object detection under adverse weather conditions.

3.1 Object detection in autonomous driving

Accurate and real-time object detection is a prerequisite for autonomous driving systems, facilitating safe navigation and autonomous decision-making in complex and dynamic environments. Among the various detection frameworks, the YOLO family has established itself as a highly influential paradigm owing to its optimal trade-off between speed and accuracy. In this work, we adopt YOLOv5, a lightweight yet high-performance object detector, as the core detection backbone, owing to its architectural efficiency and adaptability to embedded automotive platforms. YOLOv5 follows a modular and scalable design, systematically divided into three stages in Fig 1: the Backbone for hierarchical feature extraction, the Neck for multi-scale feature aggregation, and the Head for dense object prediction. Each component is meticulously designed to optimize computational efficiency while preserving fine-grained semantic information, rendering the model well-suited for real-time autonomous driving applications.

The backbone of YOLOv5 serves to extract comprehensive spatial-semantic features from input images. It is based on CSPDarknet, an evolution of Darknet-53, incorporating the Cross Stage Partial Network (CSPNet) architecture [34]. CSPNet is designed to reduce computational overhead while maintaining representational capacity. The core idea is to partition the feature map of the base layer into two parts: one part undergoes a series of convolutional transformations, while the other is bypassed and later fused. This structure promotes gradient propagation and mitigates redundant gradient information, resulting in improved training stability and reduced computational cost in terms of Floating point operations (FLOPs). Each CSP block can be formulated as:

(1)

where denotes the intermediate features propagated through stacked convolutional layers with non-linear activations, encapsulates a sequence of transformations comprising convolution, batch normalization, and activation functions such as LeakyReLU, denotes the identity-mapped feature map via a residual shortcut connection, and is the aggregated output that retains both low-level and high-level representations. The Focus module, unique to YOLOv5, spatially reorganizes the input image by slicing and interleaving spatial patches to embed spatial information into the channel dimension:

(2)

where denotes sub-sampled slices of the input with stride 2 starting from pixel offset , and is the resultant feature tensor.

To preserve multi-scale contextual information, the neck integrates a FPN and a Path Aggregation Network (PANet) [35], facilitating both top-down and bottom-up information flows. The top-down pathway of FPN is expressed as:

(3)

and the complementary bottom-up enhancement via PANet is given by:

(4)

where , and are the original feature maps from consecutive backbone layers, and represent a convolutional transformation and an upsampling operation typically implemented via interpolation or transposed convolution. Finally, the detection head produces dense predictions across multiple grid scales, each generating bounding boxes, objectness scores, and class probabilities. The output tensor is structured as , where denotes the grid size along spatial dimensions, is the number of anchors per grid cell (typically 3), 5 represents bounding box parameters , is the number of object classes. The bounding box regression is parametrized as:

(5)

where are the raw network outputs, is the sigmoid function to constrain coordinates to [0,1], denote the grid offset corresponding to the location of the prediction cell, are the anchor dimensions, are the final decoded bounding box parameters in image coordinates. Furthermore, YOLOv5 supports predictions at three different scales (e.g., 80 × 80, 40 × 40, 20 × 20 for an input size of 640 × 640), allowing the network to detect both large and small objects concurrently.

3.2 Lightweight Pix2Pix-based generative adversarial network

GANs constitute a prominent class of generative models that learn complex data distributions through adversarial interplay between a generator and a discriminator. Since the seminal work by Goodfellow et al [36], GANs have demonstrated remarkable success in diverse image-to-image translation tasks, including image synthesis, domain adaptation, and image restoration. However, conventional GAN architectures are often ill-suited for structured translation tasks where the output image must preserve spatial alignment with the input, such as weather condition translation for autonomous driving perception. To address this, the Pix2Pix GAN framework introduced by Isola et al. [37] extends the traditional GAN by conditioning both the generator and discriminator on paired input images. This allows the model to learn a direct mapping from a source domain (e.g., foggy, rainy, or nighttime imagery) to a target domain (e.g., clear-weather scenes) while maintaining structural consistency. Specifically, the Pix2Pix architecture employs a U-Net-based generator that leverages skip connections to retain fine-grained spatial details, alongside a PatchGAN discriminator that evaluates local realism over image patches, thereby enhancing texture fidelity. While highly effective, the original Pix2Pix model incurs considerable computational cost due to its dense convolutional layers and symmetric encoder-decoder design, thereby constraining real-time deployment on embedded autonomous driving platforms. In this condition, we propose a LP-GAN tailored for adverse-weather-to-clear-weather image translation with computational efficiency as a central design principle. By introducing architectural simplifications such as depthwise separable convolutions, attention-guided skip connections, and a reduced channel base, we retain the translation fidelity of the original model while dramatically reducing parameter count and inference latency. This lightweight design is particularly advantageous for real-time perception stacks in autonomous vehicles, where low-latency, high-throughput image enhancement is imperative for downstream tasks such as object detection and tracking.

The LP-GAN formulates image-to-image translation as a supervised conditional generation task, grounded in a carefully designed objective function that combines adversarial learning with reconstruction fidelity. The architecture of the proposed LP-GAN is shown in Fig 2. Formally, let denotes an image from the source domain (e.g., rainy or foggy conditions), represents its corresponding ground truth image from the target domain (i.e., clear-weather condition). The generator aims to synthesize an output image that is both visually realistic and structurally aligned with , while the discriminator seeks to distinguish between real image pairs and synthetic pairs . The full loss function of the LP-GAN is defined as a linear combination of an adversarial loss and a reconstruction loss , given by

(6)

where is a weighting coefficient that balances perceptual realism and pixel-level accuracy. The adversarial loss is formulated as a conditional GAN loss:

(7)

which encourages to be indistinguishable from the real target under the conditional input . To enforce low-frequency and structural accuracy, a pixel-wise loss is applied between the synthesized and ground-truth images:

(8)

To enhance computational efficiency and promote deployment feasibility on resource-constrained platforms, LP-GAN leverages depthwise separable convolutions (DSCs) as a principal mechanism for parameter reduction, replacing conventional convolutional layers in both the encoder and decoder pathways shown in Table 1 and 2. Formally, a standard convolution of an input tensor with filters incurs a computational cost of where is the kernel size. In contrast, a depthwise separable convolution decomposes this operation into a depthwise stage of cost and a pointwise stage of cost , yielding substantial savings in both parameters and FLOPs. Furthermore, the U-Net structure in generator is preserved via symmetric skip connections, enabling the decoder to recover high-resolution spatial details lost during down sampling. For discriminator, we use a non-separable initial convolution to ensure robust low-level feature extraction, and only downsample spatial resolution where necessary, preserving fine-grained details crucial for PatchGAN-style discrimination. These lightweight architectural choices make our discriminator well-suited for real-time applications or deployment in resource-constrained environments without compromising adversarial training stability.

4. Numerical experiments

In this section, we first evaluate the proposed LP-GAN on a large-scale dataset generated using the CARLA simulator, aiming to translate images captured under adverse weather conditions into their corresponding clear-weather counterparts. The translated images are then input into YOLOv5, to assess whether the translation enhances detection robustness and accuracy under challenging environmental conditions.

4.1 Evaluation metrics

Evaluating the quality of synthesized images is an open and difficult problem [38]. To quantitatively evaluate the quality of image translation from adverse to clear-weather conditions, we adopt three widely recognized metrics: Peak Signal-to-Noise Ratio (PSNR) [39], Structural Similarity Index Measure (SSIM) [40], and Learned Perceptual Image Patch Similarity (LPIPS) [41]. These metrics jointly capture pixel-level fidelity, structural consistency, and perceptual realism, providing a comprehensive assessment of translation performance.

(1) PSNR.

PSNR evaluates the reconstruction quality by measuring the ratio between the maximum possible pixel intensity and the mean squared error (MSE) between the generated image and the reference image . It is defined as:

(9)

where denotes the dynamic range of pixel values (e.g., 255 for 8-bit images), and are the image height and width. A higher PSNR value indicates better fidelity with respect to the reference image.

(2) SSIM.

SSIM aims to quantify image quality degradation based on changes in structural information, luminance, and contrast. It is computed over local image patches and defined as:

(10)

where , , and denote the mean, variance, and covariance of image patches, respectively, and and are small constants to stabilize the division. SSIM values range from 0 to 1, with higher values indicating greater structural similarity.

(3) LPIPS.

LPIPS compares deep feature representations extracted from pretrained convolutional neural networks to assess perceptual similarity between images. Given deep features at layer , LPIPS is defined as:

(11)

where is a learned weight vector and denotes element-wise multiplication. LPIPS reflects human perceptual judgments more accurately than pixel-wise metrics; lower scores indicate higher perceptual similarity. Together, these metrics form a robust evaluation framework to quantitatively analyze the effectiveness of adverse-to-clear weather image translation in both low-level fidelity and high-level perceptual quality.

To comprehensively evaluate the impact of image translation on downstream object detection performance, we adopt the mean Average Precision (mAP) metric, a widely used standard in the object detection community. The mAP is derived from several fundamental concepts, including Intersection over Union (IoU), Precision, and Recall, and it provides a holistic measurement of detection accuracy across multiple object categories and confidence thresholds. The IoU measures the spatial overlap between a predicted bounding box and the ground truth bounding box . It is defined as

(12)

A prediction is considered a true positive if the IoU exceeds a certain threshold (e.g., 0.5), and otherwise a false positive. Given a set of predicted and ground-truth boxes, the Precision and Recall R

are defined as:

(13)

where , , and denote the number of true positives, false positives, and false negatives, respectively. The Average Precision (AP) summarizes the precision-recall curve for a single class by computing the area under the curve:

(14)

In practice, AP is often approximated by discrete summation over recall levels:

(15)

Where and are the precision and recall at the -th threshold. The mean Average Precision is computed by averaging the AP scores over all object categories :

(16)

In our evaluation, we adopt mAP@0.5, where a detection is considered correct if IoU > 0.5, and mAP@[0.5:0.95], which averages AP over multiple IoU thresholds ranging from 0.5 to 0.95 in steps of 0.05, following COCO evaluation standards.

4.2 Results on image to image translation

To facilitate the training and evaluation of the proposed LP-GAN framework, we developed a large-scale dataset generated from the CARLA autonomous driving simulator. A total of 40,000 high-resolution RGB images were collected, evenly distributed across four representative weather conditions: clear, rainy, foggy, and nighttime (10,000 images per condition). In the training process, we adopt a batch size of 4, a total of 150 training epochs, a learning rate of , and set the loss balance coefficient to 1.5. Fig 3 illustrates the progressive refinement of the generator’s output over the training process. At the initial epoch (epoch = 1), the generator merely captures the coarse spatial layout of the scene. By epoch 50, the generator demonstrates substantial translation capability, effectively mitigating weather-induced degradations such as rain streaks, fog density, and low-light conditions. However, the results remain visually ambiguous in distant regions, with fine structures such as traffic signs, headlight reflections, and background facades appearing overly smoothed or poorly defined, highlighting the model’s current limitations in capturing high-frequency details. After 150 epochs of training, the model demonstrates a notable enhancement in fine-grained realism. The restored images not only preserve global scene geometry but also exhibit significant improvement in reconstructing delicate visual elements, such as traffic light signals, reflective surfaces, and architectural textures at long range. This progression suggests that LP-GAN benefits from a sufficient number of training iterations to stabilize adversarial training dynamics and effectively learn the cross-domain mapping from degraded to clear-weather domains.

thumbnail
Fig 3. Evolution of generator outputs during training on (a) rainy, (b) foggy and (c) nighttime scenes.

https://doi.org/10.1371/journal.pone.0333928.g003

Figs 4-6 present qualitative results of the trained LP-GAN under three representative adverse weather conditions: rain, fog, and nighttime. In the rainy scenario, we simulate a heavy downpour rather than light rain or drizzle. This scenario features prominent rain streaks, water splashes, puddles, and strong surface reflections, which significantly degrade scene visibility. The foggy scenario corresponds to a medium-density ground fog, where distant objects appear blurred and desaturated with reduced contrast and weakened edge information, leading to a marked degradation in scene visibility. The nighttime scenario reflects post-sunset illumination, with significantly reduced overall brightness, deeper shadows, and vehicle headlights and streetlights as the dominant light sources, resulting in low contrast, increased noise, and greater difficulty in perceiving lane markings and object boundaries. As shown in Fig 4, LP-GAN effectively eliminates rain-induced degradations such as atmospheric water streaks and reflective noise on road surfaces. The translated images restore clear sky appearance and road texture continuity, enabling downstream object detectors to identify lane markings and small obstacles more reliably. In Fig 5, under dense fog conditions, the model successfully reconstructs structural details of distant vehicles and buildings that are heavily obscured in the input. Notably, LP-GAN preserves semantic coherence while enhancing visibility in low-contrast regions, which is critical for long-range perception in autonomous navigation tasks. Fig 6 demonstrates the model’s robustness under nighttime scenes, where challenges arise from low illumination and glare. LP-GAN is able to recover ambient lighting, highlight reflective surfaces such as car bodies and signposts, and restore the color consistency of traffic signals. This enhances the visibility of both static infrastructure and dynamic agents, directly benefiting real-time decision-making in urban environments. Overall, the qualitative results demonstrate that LP-GAN generates visually realistic and semantically consistent clear-weather reconstructions across diverse adverse conditions. This capability facilitates improved downstream perception, as detailed subsequently.

thumbnail
Fig 4. Visual results of rainy-to-clear image translation using LP-GAN, with each column showing (from top to bottom) the rainy input, the generated output, and the clear-weather ground truth.

https://doi.org/10.1371/journal.pone.0333928.g004

thumbnail
Fig 5. Visual results of foggy-to-clear image translation using LP-GAN, with each column showing (from top to bottom) the foggy input, the generated output, and the clear-weather ground truth.

https://doi.org/10.1371/journal.pone.0333928.g005

thumbnail
Fig 6. Visual results of nighttime-to-clear image translation using LP-GAN, with each column showing (from top to bottom) the nighttime input, the generated output, and the clear-weather ground truth.

https://doi.org/10.1371/journal.pone.0333928.g006

To further validate the effectiveness of our proposed LP-GAN, we conduct a comparative evaluation against two baseline models: a standard GAN and a vanilla autoencoder (AE). As shown in Fig 7, LP-GAN consistently outperforms both baselines across three widely adopted image quality metrics (i.e., PSNR, SSIM, and LPIPS), demonstrating superior fidelity and perceptual realism in translated images.

thumbnail
Fig 7. Comparative performance of LP-GAN, standard GAN, and AE on (a) rainy, (b) foggy, and (c) nighttime scenes.

https://doi.org/10.1371/journal.pone.0333928.g007

Across all evaluated metrics, a clear performance ranking emerges, that is, LP-GAN consistently achieves the highest fidelity, followed by the standard GAN, while the autoencoder attains the lowest scores. This is expected, as the autoencoder lacks adversarial training and thus tends to produce overly smoothed outputs with diminished structural detail. Despite leveraging adversarial learning, the standard GAN is not explicitly guided by reconstruction objectives, which can produce textural inconsistencies and spatial distortions. By integrating adversarial loss with pixel-wise and perceptual consistency objectives and employing depthwise separable convolutions for efficient feature modeling, LP-GAN achieves both robust translation fidelity and computational efficiency. Consequently, the hybrid design allows LP-GAN to effectively retain semantic integrity while reconstructing fine-grained image details, even in the presence of severe degradations.

Furthermore, we observe a variation in restoration performance across different weather conditions, generally following the order of rainy, foggy, and nighttime scenarios. This trend reflects the increasing inherent complexity of the degradations associated with each condition. In rainy conditions, high-frequency artifacts such as streaks and reflections are spatially localized, allowing them to be effectively mitigated through targeted filtering operations. Fog, however, causes global contrast attenuation due to light scattering, making it more difficult to recover depth and texture information. Among the conditions considered, nighttime scenes present the most severe challenge, as extreme low-light and pronounced non-uniform illumination markedly impair feature visibility and color fidelity. Collectively, these observations underscore the necessity of developing translation networks capable of adaptively accommodating the heterogeneous characteristics of adverse weather degradations.

4.2 Results on objection detection

In this section, we investigate the potential benefits of the proposed LP-GAN in enhancing object detection robustness under adverse weather conditions in autonomous driving. To establish a reliable detection baseline, we first train the YOLOv5 model using a clean subset of our CARLA dataset, which consists exclusively of clear-weather images. The model is trained for 100 epochs using the Adam optimizer with an initial learning rate of , a batch size of 16, and weight decay set to . Cosine annealing is employed to adjust the learning rate dynamically, and data augmentation techniques such as mosaic augmentation, random horizontal flipping, and color jittering are utilized to improve generalization.

Although YOLOv5 demonstrates high accuracy under clear-weather conditions, its performance deteriorates markedly when confronted with rainy, foggy, or nighttime scenarios. As illustrated in Fig 8 (left column), the model frequently fails to detect critical objects such as vehicles and traffic lights in adverse weather environments. The degradation primarily arises from domain shifts induced by weather-specific artifacts, including rain-induced occlusions, fog-related visibility attenuation, and the loss of texture and contrast under low-light nighttime conditions. These phenomena severely impair the model’s ability to extract discriminative features and accurately localize objects. In autonomous driving applications, these missed detections can have severe implications, potentially resulting in failure to yield, misinterpretation of traffic signals, or collisions with previously unrecognized obstacles. To address this challenge, we integrate the LP-GAN module into the perception pipeline. In the proposed pipeline, images acquired under adverse weather are first translated by LP-GAN into photorealistic clear-weather counterparts (Fig 8, right column), which subsequently serve as input to the YOLOv5 detector. This preprocessing step effectively mitigates the domain shift and restores key visual cues, such as the structural integrity of distant vehicles, the visibility of lane markings, and the luminance of traffic signals. As a result, the enhanced pipeline successfully recovers object detection performance under challenging conditions, demonstrating improved detection of both large and small-scale targets. These results underscore the pivotal contribution of generative weather translation toward enhancing perception robustness in real-world autonomous driving environments.

thumbnail
Fig 8. Object detection results in three weather conditions.

https://doi.org/10.1371/journal.pone.0333928.g008

Quantitative analysis of the object detection performance under different weather conditions is illustrated in Fig 9. Specifically, we compare the mAP of YOLOv5 in two settings: (1) directly applying YOLOv5 to raw adverse-weather images, and (2) applying YOLOv5 to images first translated into clear-weather counterparts by our LP-GAN. For the original adverse-weather images, the mAP values are 0.68 for rainy, 0.62 for foggy, and 0.54 for nighttime scenarios. The comparatively low mAP values underscore the substantial impact of environmental degradations, including rain-induced occlusions, fog-induced visibility attenuation, and nighttime illumination loss. When LP-GAN is employed as a pre-processing module to translate adverse-weather images into clearer views, the detection accuracy improves substantially, reaching 0.81 (rainy), 0.78 (foggy), and 0.76 (nighttime).

thumbnail
Fig 9. mAP comparison under three different adverse weather conditions.

https://doi.org/10.1371/journal.pone.0333928.g009

The observed gains, exceeding 0.1 mAP across all scenarios, demonstrate the efficacy of the LP-GAN enhancement pipeline in counteracting weather-induced performance degradation. By restoring key visual semantics such as object contours, textures, and illumination cues, LP-GAN enables downstream detectors to operate with enhanced robustness and reliability. This performance improvement is especially significant in the context of autonomous driving, where missed detections of critical objects (e.g., vehicles, pedestrians, or traffic signals) under adverse weather can precipitate hazardous decisions. The proposed LP-GAN framework, by improving perception reliability, contributes to safer and more dependable autonomous navigation. Moreover, from the passenger’s perspective, the image enhancement provided by LP-GAN contributes not only to technical perception improvements but also to the psychological acceptance of autonomous systems. By producing clearer and more interpretable visual outputs even under adverse weather, LP-GAN increases the perceived transparency and reliability of the vehicle’s decision-making process. This, in turn, can significantly reduce passenger anxiety and foster greater trust in the safety and robustness of autonomous driving, especially in challenging real-world scenarios. Nevertheless, detection performance continues to decline from rainy to foggy and nighttime conditions, even with LP-GAN enhancement, highlighting the persistent difficulty of extreme low-light scenarios. Future work may explore the integration of LP-GAN with multimodal inputs (e.g., LiDAR, thermal imaging), temporal consistency modeling, or low-light-specific enhancement techniques to further address these limitations in the most challenging environments.

5 Conclusions

In this study, we propose LP-GAN, a lightweight generative adversarial framework tailored for translating adverse weather images under rainy, foggy, and nighttime conditions into their clear-weather counterparts, thereby enhancing the visual input quality for downstream autonomous driving perception tasks. Extensive experiments conducted on a large-scale dataset generated via the CARLA simulator demonstrate that LP-GAN significantly improves image fidelity across multiple perceptual metrics, including PSNR, SSIM, and LPIPS. Furthermore, by integrating LP-GAN with state-of-the-art object detection models such as YOLOv5, we observe consistent improvements in detection accuracy across challenging weather conditions, validating the model’s practical effectiveness in real-world autonomous driving pipelines. In addition to technical benefits, our approach also enhances transparency and interpretability, thereby improving user trust and reducing passenger anxiety in adverse environments. Beyond its experimental validation, LP-GAN demonstrates substantial potential for real-world deployment in both commercial and industrial contexts. In autonomous driving applications, it enhances vehicle safety and reliability by mitigating perception failures caused by adverse weather. Within intelligent transportation systems, LP-GAN can facilitate the development of more resilient infrastructure by providing weather-robust visual inputs. Additionally, it serves as a cost-effective tool for data augmentation in industrial testing pipelines, reducing reliance on labor-intensive and expensive real-world adverse-weather data collection.

Future work will focus on extending LP-GAN to support a broader range of perception tasks, including lane detection, depth estimation, and semantic segmentation, within a unified translation–perception framework. Furthermore, integrating uncertainty quantification into the generative process will enable confidence estimates for translated outputs, which are critical for safe decision-making in autonomous systems. Collectively, these enhancements position LP-GAN to bridge the gap between simulation-based research and large-scale industrial deployment, contributing to the next generation of reliable, trustworthy, and commercially viable autonomous driving technologies.

References

  1. 1. Hancock PA, Nourbakhsh I, Stewart J. On the future of transportation in an era of automated and autonomous vehicles. Proc Natl Acad Sci U S A. 2019;116(16):7684–91. pmid:30642956
  2. 2. Chen Y, Yu F. A Novel Simulation-Based Optimization Method for Autonomous Vehicle Path Tracking with Urban Driving Application. Mathematics. 2023;11(23):4762.
  3. 3. Chen Y, Zhang Q, Yu F. Transforming traffic accident investigations: a virtual-real-fusion framework for intelligent 3D traffic accident reconstruction. Complex Intell Syst. 2024;11(1).
  4. 4. Shiwakoti N, Stasinopoulos P, Fedele F. Investigating the state of connected and autonomous vehicles: a literature Review. Transportation Research Procedia. 2020;48:870–82.
  5. 5. Liu F, Lu Z, Lin X. Vision-based environmental perception for autonomous driving. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering. 2025;239(1):39–69.
  6. 6. Yang H, Liu Z, Ma N, Wang X, Liu W, Wang H, et al. CSRM-MIM: A Self-Supervised Pretraining Method for Detecting Catenary Support Components in Electrified Railways. IEEE Trans Transp Electrific. 2025;11(4):10025–37.
  7. 7. Yan J, Cheng Y, Zhang F, Zhou N, Wang H, Jin B, et al. Multimodal Imitation Learning for Arc Detection in Complex Railway Environments. IEEE Trans Instrum Meas. 2025;74:1–13.
  8. 8. Yan J, Cheng Y, Zhang F, Li M, Zhou N, Jin B, et al. Research on multimodal techniques for arc detection in railway systems with limited data. Structural Health Monitoring. 2025.
  9. 9. Cai Y, Luan T, Gao H, Wang H, Chen L, Li Y, et al. YOLOv4-5D: An Effective and Efficient Object Detector for Autonomous Driving. IEEE Trans Instrum Meas. 2021;70:1–13.
  10. 10. Xu C, Sankar R. A Comprehensive Review of Autonomous Driving Algorithms: Tackling Adverse Weather Conditions, Unpredictable Traffic Violations, Blind Spot Monitoring, and Emergency Maneuvers. Algorithms. 2024;17(11):526.
  11. 11. Mușat V, Fursa I, Newman P, Cuzzolin F, Bradley A. Multi-weather city: Adverse weather stacking for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 2906–15.
  12. 12. Kumar D, Muhammad N. Object Detection in Adverse Weather for Autonomous Driving through Data Merging and YOLOv8. Sensors (Basel). 2023;23(20):8471. pmid:37896564
  13. 13. Wang H, Song Y, Yang H, Liu Z. Generalized Koopman Neural Operator for Data-driven Modelling of Electric Railway Pantograph-catenary Systems. IEEE Trans Transp Electrific. 2025;:1–1.
  14. 14. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. pp. 580–7.
  15. 15. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 779–88.
  16. 16. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I, 2016. 21–37.
  17. 17. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017. 2117–25.
  18. 18. Sakaridis C, Dai D, Van Gool L. Semantic Foggy Scene Understanding with Synthetic Data. Int J Comput Vis. 2018;126(9):973–92.
  19. 19. Bijelic M, Gruber T, Mannan F, Kraus F, Ritter W, Dietmayer K, et al. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 11682–92.
  20. 20. Narasimhan SG, Nayar SK. Contrast restoration of weather degraded images. IEEE Trans Pattern Anal Machine Intell. 2003;25(6):713–24.
  21. 21. Hautiere N, Tarel J-P, Aubert D. Mitigation of Visibility Loss for Advanced Camera-Based Driver Assistance. IEEE Trans Intell Transport Syst. 2010;11(2):474–84.
  22. 22. Chen Y, Li W, Sakaridis C, Dai D, Van Gool L. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 3339–48
  23. 23. Ding Q, Li P, Yan X, Shi D, Liang L, Wang W, et al. CF-YOLO: Cross Fusion YOLO for Object Detection in Adverse Weather With a High-Quality Real Snow Dataset. IEEE Trans Intell Transport Syst. 2023;24(10):10749–59.
  24. 24. Saito K, Ushiku Y, Harada T, Saenko K. Strong-weak distribution alignment for adaptive object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019. 6956–65.
  25. 25. Xu M, Wang H, Ni B, Tian Q, Zhang W. Cross-domain detection via graph-induced prototype alignment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 12355–64.
  26. 26. Li B, Peng X, Wang Z, Xu J, Feng D. Aod-net: All-in-one dehazing network. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 4770–8.
  27. 27. Wang Y, Yan X, Zhang K, Gong L, Xie H, Wang FL, et al. Togethernet: Bridging image restoration and object detection together via dynamic enhancement learning. In: Computer graphics forum, 2022. 465–76.
  28. 28. Fang W, Zhang G, Zheng Y, Chen Y. Multi-Task Learning for UAV Aerial Object Detection in Foggy Weather Condition. Remote Sensing. 2023;15(18):4617.
  29. 29. Nayar SK, Narasimhan SG. Vision in bad weather. In: Proceedings of the seventh IEEE international conference on computer vision, 1999. 820–7.
  30. 30. He K, Sun J, Tang X. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans Pattern Anal Mach Intell. 2011;33(12):2341–53. pmid:20820075
  31. 31. Jobson DJ, Rahman Z, Woodell GA. A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans Image Process. 1997;6(7):965–76. pmid:18282987
  32. 32. Cai Y, Bian H, Lin J, Wang H, Timofte R, Zhang Y. Retinexformer: One-stage retinex-based transformer for low-light image enhancement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023. 12504–13.
  33. 33. Wang X, Jiang H, Zeng T, Dong Y. An adaptive fused domain-cycling variational generative adversarial network for machine fault diagnosis under data scarcity. Information Fusion. 2026;126:103616.
  34. 34. Wang CY, Liao HYM, Wu YH, Chen PY, Hsieh JW, Yeh IH. CSPNet: A new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020. 390–1.
  35. 35. Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018. 8759–68.
  36. 36. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks. Commun ACM. 2020;63(11):139–44.
  37. 37. Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017. 1125–34.
  38. 38. Baraheem SS, Le T-N, Nguyen TV. Image synthesis: a review of methods, datasets, evaluation metrics, and future outlook. Artif Intell Rev. 2023;56(10):10813–65.
  39. 39. Jähne B. Digital image processing. Springer Science & Business Media. 2005.
  40. 40. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13(4):600–12. pmid:15376593
  41. 41. Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018. 586–95.