S2DB-mmWave YOLOv8n: Multi-object detection for millimeter-wave radar using YOLOv8n with optimized multi-scale features

Mengqi Yuan; Yajing Yuan; Xiangqun Zhang; Zhenghao Zhu; Chenxi Zhao; Xiangqian Gao; Genyuan Du

doi:10.1371/journal.pone.0332931

Abstract

Millimeter-wave (mmWave) radar has become an important research direction in the field of object detection because of its characteristics of all-time, low cost, strong privacy and not affected by harsh weather conditions. Therefore, the research on millimeter wave radar object detection is of great practical significance for applications in the field of intelligent security and transportation. However, in the multi-target detection scene, millimeter wave radar still faces some problems, such as unable to effectively distinguish multiple objects and poor performance of detection algorithm. Focusing on the above problems, a new target detection and classification framework of S2DB-mmWave YOLOv8n, based on deep learning, is proposed to realize more accuracy. There are three main improvements. First, a novel backbone network was designed by incorporating new convolutional layers and the Simplified Spatial Pyramid Pooling - Fast (SimSPPF) module to strengthen feature extraction. Second, a dynamic up-sampling technique was introduced to improve the model’s ability to recover fine details. Finally, a bidirectional feature pyramid network (BiFPN) was integrated to optimize feature fusion, leveraging a bidirectional information transfer mechanism and an adaptive feature selection strategy. A publicly available 5-class object mmWave radar heatmap dataset, including 2,500 annotated images, were selected for data modeling and method evaluation. The results show that the mean average precision (mAP), precision and recall of the S2DB-mmWave YOLOv8n model were 93.1% mAP@0.5, 55.8% mAP@0.5:0.95, 89.4% and 90.6%, respectively, which is 3.3, 1.6, 4.5 and 7.7 percentage points higher than the baseline YOLOv8n network without increasing the parameter count.

Citation: Yuan M, Yuan Y, Zhang X, Zhu Z, Zhao C, Gao X, et al. (2025) S2DB-mmWave YOLOv8n: Multi-object detection for millimeter-wave radar using YOLOv8n with optimized multi-scale features. PLoS One 20(9): e0332931. https://doi.org/10.1371/journal.pone.0332931

Editor: Nattapol Aunsri, Mae Fah Luang University, THAILAND

Received: April 20, 2025; Accepted: September 6, 2025; Published: September 19, 2025

Copyright: © 2025 Yuan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting Information files.

Funding: This work was supported by the Henan Province Key R&D Special Project (No. 241111212500) and the Henan Province Key R&D and Promotion Special (Technology Tackling Key) Project (No. 232102210181). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Thanks.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Frequency-Modulated Continuous Wave (FMCW) radar has gained significant traction in industrial inspection systems and Advanced Driver Assistance Systems (ADAS), particularly for obstacle detection applications in both indoor and outdoor surveillance scenarios. This adoption surge primarily stems from its inherent advantages, including cost-effectiveness and reliable operational capabilities under adverse atmospheric conditions such as haze, smog, blizzards, and particulate-laden environments. In the autonomous driving domain, object recognition technologies employing red, green, blue (RGB) cameras, mmWave radar, and light detection and ranging (LiDAR) have been widely adopted in automotive ADAS and are increasingly integrated into diverse vehicle platforms, including construction machinery, passenger cars, and commercial trucks [1]. Construction machinery vehicles, for instance, typically operate in dust-laden environments with significant airborne particulates (e.g., sand and mud) or under optically challenging conditions such as nighttime operations, inclement weather, or lens contamination. Under these circumstances, ADAS systems must maintain high-precision real-time detection of personnel and objects. However, the visibility and detection performance of RGB cameras deteriorate substantially in extreme environments, necessitating the adoption of alternative sensing modalities such as mmWave radar and LiDAR [2]. Notably, driven by 5G-enabled advancements, mmWave radar has become more cost-effective than LiDAR, significantly enhancing its practical appeal for automotive applications. In surveillance applications for both indoor and outdoor environments, RGB cameras remain the predominant solution; however, their recognition performance degrades significantly under poor illumination, bad weather, and privacy-sensitive scenarios [3]. Moreover, the detection capability is further impaired in the conditions like smoke and haze presence. Collectively, FMCW radar demonstrates greater adaptability to extreme working conditions compared to RGB cameras and LiDAR and higher detection accuracy compared to ultrasonic and infrared sensors [4].

As most of the objects detected using FMCW radar have a micro-Doppler signature, this makes the classification of objects possible. The classical approach to perform these operations, both indoor and outdoor, involves the use of Constant False Alarm Rate (CFAR) thresholds on radar processed signals [5].The method dynamically calculates the threshold by analyzing the ambient noise power surrounding the radar echo signal and determines the presence of a target based on the computed threshold, who significantly enhances target detection probability and makes it more adaptable to complex cluttered environments. However, in non-uniform environments, the detection performance of the CFAR algorithm deteriorates rapidly. In most cases, the CFAR algorithms are difficult to achieve a correct solution to complex identification tasks. Over time, the CFAR algorithm has been modified and improved several new variants, such as Cell Averaging CFAR (CA-CFAR) [6], Order Statistics CFAR (OS-CFAR) [7], and Greatest of CFAR (GO-CFAR) [8]. All current CFAR algorithms perform target detection through a reference window on radar map. However, the use of reference windows reduces detection efficiency and induces model mismatch issues. Recent studies have applied Machine Learning (ML) algorithms to process the collected radar data, where data-driven approaches learn complex nonlinear relationships between noise and targets to adapt to dynamic environments, while automatically extracting multidimensional features to capture subtle target variations. Experimental results demonstrate that the ML-based approach exhibits high robustness with respect to the traditional CFAR thresholds in noisy scenarios.

With the progressive breakthroughs in machine learning and computational power enhancements from hardware innovations, low-cost millimeter-wave radar systems integrated with deep learning architectures have found applications in gesture recognition [9], human imaging and tracking [10], and indoor mapping [11]. The inherent differences between millimeter-wave radar data and conventional camera imagery necessitate specialized representation and processing methods to optimize deep learning performance in radar applications. Currently, learning from radar data in point cloud format has been extensively studied [12,13]. For instance, [12] proposed a semantic segmentation network for radar point clouds, and [13] adapted PointNets for 2D target detection using radar point clouds. However, point cloud generation often relies on filtering and thresholding techniques to eliminate background clutter and noise, which can lead to information loss due to hard-coded filtering algorithms. To address the information degradation, radar data can be transformed into range-angle-Doppler dimensional heatmaps [14], thereby fully utilizing the signal characteristics inherent to millimeter-wave radar returns.

Initially, the most common ML techniques used for target recognition were based on Convolutional Neural Networks (CNN) [15]. Jiang et al. [16] streamlined the target detection workflow in range-azimuth maps by jointly analyzing range and Doppler dimensions of millimeter-wave radar data. They conducted a comparison experiment that demonstrated the effectiveness of CNN in such tasks. However, the experimental data are simulated and not tested in real scenarios. Today, real-time object detection methods, such as You Only Look Once (YOLO), have been effectively integrated into millimeter-wave radar signal processing for latency-sensitive applications. Gupta et al. [17]developed a dataset comprising range-azimuth heatmaps of targets detected by FMCW radar, subsequently employing a Darknet53-based YOLOv3 architecture for robust target classification across diverse operational scenarios and dynamic target variations. Lamane et al. [18] proposed a hybrid framework integrating FMCW radar, YOLOv7, and Pix2Pix architectures to improve detection accuracy. This method employs Pix2Pix for dataset denoising in range-azimuth heatmaps, followed by training an enhanced YOLOv7 model on the refined thermal representations. Experimental results demonstrate the improvement in detection performance, but also reveal inadvertent suppression of large-scale targets and small objects during the denoising phase, suggesting potential information loss in extreme target size conditions. Kosuge et al. [19] employed a 2D multiple-input multiple-output (MIMO) radar system achieving an imaging effect comparable to RGB sensors. The data from the sensors was fused and fed into the YOLOv3 model for target detection and classification. The experiments demonstrated that the method maintains viable detection capabilities under visual information scarcity, while persisting privacy concerns inherent to the methodology and suboptimal detection performance for low-resolution small targets remain challenging. Kim et al. [20] proposed a radar-to-image conversion method by YOLOv2 network with range-azimuth heatmaps, demonstrating the feasibility of cross-modal adaptation for target detection. Michela et al. [21] achieved robust detection without background suppression through direct fusion of radar data cubes with YOLOv3 frameworks, though the method validation focused primarily on single-target scenarios, leaving multi-object detection capabilities unverified. Zhang et al. [22] proposed a multi-algorithm classification framework utilizing dual-input range-azimuth-Doppler and range-velocity heatmaps processed through YOLOv4, followed by Cartesian coordinate transformation for spatial localization. While existing studies, as noted by Tao et al. [23], predominantly rely on radar-camera fusion for classification tasks, this work investigates millimeter-wave radar as a standalone sensor, aiming to validate its capability for high-precision multi-target detection in complex scenarios.The advantages and disadvantages of the aforementioned related methods are listed in Table 1.

Download:

Table 1. Advantages and Disadvantages Analysis of Related Methods.

https://doi.org/10.1371/journal.pone.0332931.t001

As the images of mmWave radar contain heatmap of objects which has indistinct boundaries or shape, it makes the objects difficult to be separated from the background to recognize. To address the issue, we present a multi-object detection framework for mmWave radar thermograms based on an improved YOLOv8n architecture, which specifically optimized for radar thermal data through enhanced multi-scale feature extraction mechanisms. The principal contributions of this work can be summarized as follows:

Advanced convolutional layers integrated with a Simplified Spatial Pyramid Pooling - Fast (SimSPPF) structure are introduced into the backbone network, replacing conventional convolutional operations and the Spatial Pyramid Pooling - Fast(SPPF) module. This configuration mitigates gradient vanishing while reducing information loss, thereby significantly enhancing both feature extraction capabilities and network robustness—particularly when processing low-resolution radar heatmap data.
To mitigate the loss of fine-grained information during upsampling in YOLOv8n, we propose a novel upsampling module. By adaptively fusing multi-level feature representations, this design enables more precise image detail reconstruction and substantially improves object localization accuracy.
The neck network incorporates a BiFPN module to optimize multi-scale feature fusion, thereby improving detection performance for objects at varying scales.
The proposed S2DB-mmWave YOLOv8n model achieves 93.1% mAP@0.5 and 55.8% mAP@0.5:0.95 on the mmWave radar range-azimuth heatmap dataset, surpassing the baseline by 3.3% and 1.6% respectively, with concurrent precision and recall rates of 89.4% and 90.6%, which demonstrates significant improvements in classification accuracy and overall detection performance for mmWave radar systems.

2. Materials and methods

2.1. Introduction to YOLOv8n model

YOLOv8n represents an advanced evolution in object detection algorithms, building upon the successes of its predecessors in the YOLO series. The YOLOv8n network architecture consists of three primary components: the Backbone, the Feature Enhancement Network (Neck), and the Detection Head. Feature extraction within the backbone is performed by a combination of Conv, C2f and SPPF modules. In particular, the C2f module introduced in YOLOv8n enhances the efficiency of feature extraction and lays the foundation for further exploration and optimization. In the Neck module, YOLOv8n extends the PA-FPN architecture by eliminating specific convolutional layers during the up-sampling stage, thereby improving computational efficiency. The Detection Head incorporates a decoupled design that separates classification and localization branches, effectively resolving the inherent conflict between classification and regression tasks while enhancing overall detection performance. The complete network architecture is depicted in Fig 1.

Download:

Fig 1. YOLOv8n network architecture diagram.

https://doi.org/10.1371/journal.pone.0332931.g001

2.2. The S2DB-mmWave YOLOv8n methodology

In order to improve the performance of the model in multi-target detection of mmWave radar heat map and enhance its ability of flexible deployment in complex environments, we propose the S2DB-mmWave YOLOv8n model, of which the network architecture is shown in Fig 2. The main contributions of the work include three modifications. Firstly, it is the backbone network architecture optimization, called SimBackbone, which replaces the traditional convolutions with simplified convolution (SimConv) [24] module and adopts simplified spatial pyramid pooling–fast (SimSPPF) to replace the original SPPF module. Secondly, a new up-sampling technology, DySample [25], is used to replace the up-sampling module of the original YOLOv8n, which can gain more detail feature information of the object to fuse. Thirdly, the BiFPN [26], including the bidirectional feature propagation mechanism and the feature weighting strategy, is integrated in the neck part to optimize feature fusion, which address the issue of undifferentiated summation in conventional methods by enabling more effective fusion of multi-resolution feature maps with varying importance.

Download:

Fig 2. Improved network structure of the YOLOv8n model.

https://doi.org/10.1371/journal.pone.0332931.g002

2.2.1 Backbone network improvements.

A. SimConv

The structural design of SimConv is depicted in Fig 3. The key distinction between the SimConv and the original convolution of the YOLOv8n model (Conv) lies in the choice of activation functions: SiLU in Conv versus ReLU in SimConv. ReLU is computationally simpler, making it easier to implement and less susceptible to numerical instability compared to SiLU, which results in faster neural network training and inference.

Download:

Fig 3. SimConv structure.

https://doi.org/10.1371/journal.pone.0332931.g003

Moreover, ReLU effectively mitigates the gradient vanishing issue commonly observed with activation functions like sigmoid or tanh, offering a more efficient alternative to the traditional sigmoid function. The mathematical formulation of the ReLU function is presented in Equation (1), and its corresponding function graph is illustrated in Fig 4.

Download:

Fig 4. Image of the ReLU function.

https://doi.org/10.1371/journal.pone.0332931.g004

(1)

B. Pyramid Pooling Layer

Spatial pyramid pooling is an advanced feature fusion technique that integrates multi-scale features by mapping local features into a multi-dimensional space, effectively reducing information loss. In this paper, SimSPPF, an improved spatial pyramid pooling module, is integrated into the YOLOv8n object detection framework to enhance feature extraction. The core improvement of SimSPPF is replacing the Conv module of YOLOv8n model with the SimConv module.

Fig 5 shows the detail structure of SimSPPF, which compresses the input feature map through convolution firstly, and extracts multi-scale features using three sequential MaxPool2D layers with identical configurations, and then the extracted multi-scale features undergo a Concat operation to fuse. The fused features are input in SimConv for upscaling. As the result, the ability of the structure feature integration is enhanced while minimizing information loss during fusion, and the feature expressiveness and detection performance is improved.

Download:

Fig 5. SimSPPF module structure.

https://doi.org/10.1371/journal.pone.0332931.g005

2.2.2. Up-sampling technique.

The up-sampling process plays a critical role in feature pyramid network design, where kernel-based dynamic approaches like CARAFE [27], FADE [28], and SAPA [29] have demonstrated improved detection performance. In mmWave radar heatmaps, the presence of numerous small targets and low image resolution poses challenges for up-sampling. Traditional methods like nearest neighbor and bilinear interpolation have limited receptive fields and often lose critical information during multi-scale feature fusion. To address these issues, DySample, a lightweight dynamic up-sampling method, is integrated into the neck part to enhance model robustness to noise while significantly reducing parameter count and computational resource consumption.

The Dysample method workflow is shown in Fig 6, which adapts the group up-sampling strategy that partitions feature maps into multiple independent groups, and each group generates dedicated sampling offsets to minimize inter-feature interference. First, the feature map χ whose size is C × H × W, is resampled by the sampling point generator to generate a set of 2 × sH × sW point samples δ, which contains the textual information. Then, the δ and χ are input the grid sample function (grid_sample, as shown in Equation (2)) to conduct a new C × sH × sW feature map χ′ for the further fusion.

Download:

Fig 6. The sampling workflow diagram of the Dysample module.

https://doi.org/10.1371/journal.pone.0332931.g006

(2)

Fig 7 gives the key sampling point generator process. DySample takes a low-resolution feature map χ as input and first generates a dynamic range adjustment factor through a linear transformation layer (linear1), constraining its value within [0, 0.5] to control the sampling range. Subsequently, another linear transformation layer (linear2) produces an initial offset O_init. By combining the dynamic adjustment factor with the initial offset O_init, a dynamic offset is generated. This offset O has dimensions 2s²×H × W, and is then rescaled to a 2 × sH × sW O′, and the final sampling ensemble δ is obtained by summing the offset O′ with the original sampling grid G. Fig 8 displays key parts of the pseudocode within the DySample module.

Download:

Fig 7. The structure of the sample point generator in Dysample.

https://doi.org/10.1371/journal.pone.0332931.g007

Download:

Fig 8. The key parts of the pseudocode within the DySample module.

https://doi.org/10.1371/journal.pone.0332931.g008

2.2.3. Weighted bidirectional feature pyramid networks.

Conventional feature pyramid networks (e.g. PANet [30], whose network is shown in Fig 9) often suffer from information loss and inefficient feature propagation in dealing with multi-scale targets. The use of simple addition (Add) or concatenation (Concat) does not adequately account for the relative importance of each feature during the fusion process.To address these problems, BiFPN, an efficient multi-scale feature fusion module, is introduce in the neck part to improve detection by fusing features from different layers. Fig 10 illustrates the BiFPN network structure, which contains a bidirectional information transfer mechanism,and shows the feature maps of both inputs and outputs. This mechanism fuses feature maps of different resolutions through top-down and bottom-up paths. Feature weights are assigned in a learnable manner to enhance important features and suppress redundant information. By removing individual input edge nodes and creating jump lateral connections between peer nodes, BiFPN effectively enhances feature fusion, reduces redundant computation, and enriches the feature representation of the model.

Download:

Fig 9. PANet network design.

https://doi.org/10.1371/journal.pone.0332931.g009

Download:

Fig 10. BiFPN network design.

https://doi.org/10.1371/journal.pone.0332931.g010

Using the P6-level feature map (illustrated in Fig 10 as an example), BiFPN generates two fused features: the top-down aggregated feature and the output feature ,formally expressed as:

(3)

(4)

where , , , and denote the output feature at level 5, the input feature at level 6, the intermediate feature at level 6, the output feature at level 6, and the input feature at level 7, respectively; Conv represents the convolution operation; Resize refers to upsampling or downsampling operations; and are learnable weight parameters; and is set to 0.0001 to ensure numerical stability.

3. Experiment

3.1. The dataset

The dataset used in this paper is based on the public mmWave radar dataset [18], which is acquired using the second-generation single-chip mmWave radar AWR2944, and four radar frames are extracted per second with each frame corresponding to a camera frame. The relevant parameters of the radar are listed in Table 2. The radar data is converted into Cartesian representation and recorded as a heatmap. The heatmap and the camera data are then compared with corresponding labels to ensure compatibility with the YOLO model. The Roboflow tool is used for labeling the target areas, ensuring compatibility with the YOLO model.

Download:

Table 2. Radar parameters.

https://doi.org/10.1371/journal.pone.0332931.t002

The acquisition and labeling of mmWave radar heatmaps is a complex and time-consuming process, which is often limited by insufficient data volume. To address this challenge, data augmentation techniques are applied to the original public dataset to expand the training dataset, which improves the model’s generalization and robustness. The data augmentation parameters on the Roboflow platform are set as follows: random brightness adjustment of between −5% to +5% and Salt and pepper noise was applied to 0.14% of pixels. As a result, the dataset is expanded to over 2500 images.

3.2. Experimental environment and configuration

All experiments were conducted using PyCharm on a 64-bit Windows 11 operating system with Python version 3.8.0. The hardware configuration includes an NVIDIA GeForce GTX 1650 GPU and an Intel(R) Core (TM) i5-9300H CPU @ 2.4GHz. The hyperparameters used for training are listed in Table 3.

Download:

Table 3. Table of training hyperparameters.

https://doi.org/10.1371/journal.pone.0332931.t003

3.3. Model evaluation metrics

In object detection, precision, recall, and mAP are commonly used to evaluate detectors. Precision measures the proportion of correctly identified positive samples among all those predicted as positive, while recall indicates the ratio of actual positive samples within the predicted set. The formulas for accuracy and recall are presented by Equation (5) and Equation (6):

(5)

(6)

where TP refers to the detector correctly identifying annotated objects, FP denotes the detector mistakenly predicting background regions as annotated objects, and FN represents the detector incorrectly classifying annotated objects as background regions.

mAP is an important metric for object recognition. It integrates precision and recall into a single value, providing a holistic assessment of a model’s accuracy across different thresholds. mAP@0.5 represents the mean average accuracy calculated at an Intersection over Union (IoU) threshold of 50%, and its formula is given in Equation (7). It is often used to assess the overall object localization performance of a model. mAP@0.5:0.95 refers to the mean average accuracy calculated over multiple IoU thresholds ranging from 50% to 95% (with a step size of 5%), which can be descripted in Equation (8). It provides a more comprehensive assessment of a model’s detection performance by considering both localization accuracy and confidence calibration, making it a more rigorous and reliable metric for evaluating object detection models.

(7)

(8)

(9)

In these equations, C is the total number of all categories in the dataset, AP_C is the AP for category C and AP shown in Equation (9) is obtained by calculating the precision values at different recall levels and then averaging them.

This paper uses frames per second (FPS) to evaluate the inference speed of the model. The core idea behind calculating FPS is to measure the total time required for the model to process a certain number of images, and then divide the number of images by this total time to obtain the number of image frames processed per second,as shown in Equation (10).

(10)

The Giga Floating Point Operations per Second (GFLOPs) is a metric for measuring computational performance, indicating the number of billions of floating-point operations that can be executed per second. When describing model performance, GFLOPs is commonly used to assess the computational complexity and efficiency of a model when processing data.

3.4. Experimental results and analysis

3.4.1. Comparison with different models.

To validate the algorithmic advancements of our approach, we conduct comprehensive comparative experiments on the mmWave radar thermogram dataset. The proposed model is rigorously evaluated against state-of-the-art detection architectures including RT-DETR [31], YOLOv9 [32], and YOLOv11 [33]. Table 4 gives the relative performance across all evaluated methods.

Download:

Table 4. Comparative test results.

https://doi.org/10.1371/journal.pone.0332931.t004

High precision alone may cause missed detections, whereas high recall alone may cause false positives. The mAP metric addresses this trade-off by harmonizing both precision and recall, thereby providing a comprehensive performance measure. As shown in Table 4, the proposed S2DB-mmWave YOLOv8n model achieved the best performance in terms of mAP, which improves mAP@0.5 by 3.3% over YOLOv8n and 2% over YOLOv11n, achieving significant performance gains compared to the YOLO series models. Compared with Pix2Pix+YOLOv7-PM across varying scales, though S2DB-mmWave YOLOv8n exhibits a 3% reduction in precision, but this is accompanied by a significant decrease of 89.1M in model parameters and other performance metrics are notably enhanced. The S2DB-mmWave YOLOv8n model achieves high accuracy with a computational complexity of 8.1 GFLOPs, positioning it in the lower-middle range among comparative baseline modes. This indicates well-preserved feature extraction capabilities without substantial computational overhead. The model attains a competitively high FPS, demonstrating real-time processing efficiency. Furthermore, its efficient architecture design ensurs favorable scalability and hardware compatibility, enabling deployment across diverse operational scenarios. Experimental results demonstrate that S2DB-mmWave YOLOv8n gains the superior object detection capabilities compared with other models across multiple metrics on the millimeter-wave radar heatmap dataset.

3.4.2. Ablation experiments.

To validate the effectiveness of our work in multi-object detection for mmWave radar heatmaps, we conducted ablation experiments on three key components: SimBackbone, BiFPN and DySample. Five experimental configurations were designed by progressively integrating into the YOLOv8n framework. The contribution of each enhancement to overall detection performance is shown in Table 5.

Download:

Table 5. Results of ablation experiments.

https://doi.org/10.1371/journal.pone.0332931.t005

The ablation results confirm that each proposed enhancement contributes positively to the overall performance. According to the data in Table 5, Experiment 1 uses the original YOLOv8n model, achieving 89.8% mAP@0.5, 54.2%mAP@0.5:0.95, 84.9% precision, and 82.9% recall. Experiment 2 introduced the BiFPN module to optimize the multi-scale feature fusion by leveraging contextual information across different feature levels, enhancing the model’s accuracy with mAP@0.5 and precision improving by 1.8% and 3.4% respectively, compared to Experiment 1. In Experiment 3, which replaced the original up-sampling with Dysample, shows increases in mAP@0.5 and mAP@0.5:0.95 by 0.5% and 0.9% respectively, indicating that DySample enhances detail restoration, preserves depth consistency in planar regions, and effectively handles gradual depth variations. Experiment 4, after replacing the backbone network with SimBackbone, the model’s mAP@0.5, precision, and recall all improved by 1.5%, 2% and 1.5% respectively, suggesting that the SimBackbone further improves multi-scale feature integration while mitigating gradient vanishing issues. By enhancing computational efficiency and simplifying implementation, it strengthens the model’s feature extraction accuracy. In Experiment 5, BiFPN and Dysample were added simultaneously. By replacing the fixed sampling method of BiFPN with DySample, the up-sampling process is dynamically adjusted, and more detailed information is preserved. The results show that these added modules lead to 2.5, 0.3, 1.7 and 6.2 percentage points increases in mAP@0.5, mAP@0.5:0.95, precision and recall respectively, improving detection accuracy. Compared to the baseline model, the proposed model not only achieves optimal detection performance, with increases in mAP@0.5, mAP@0.5:0.95, precision, and recall of 3.3%, 1.6%, 4.5%, and 7.7% respectively, but also maintains the FPS and GFLOPs, ensuring efficiency and performance.

Figs 11–14 illustrates the gradual improvement in performance metrics across the entire enhancement process, emphasizing the contribution of each component to the model’s detection capability. To provide a clearer view, the zoomed-in section highlights the performance changes between the 80th and 100th training epochs.

Download:

Fig 11. mAP@0.5 results.

https://doi.org/10.1371/journal.pone.0332931.g011

Download:

Fig 12. mAP@0.5:0.95 results.

https://doi.org/10.1371/journal.pone.0332931.g012

Download:

Fig 13. Precision results.

https://doi.org/10.1371/journal.pone.0332931.g013

Download:

Fig 14. Recall results.

https://doi.org/10.1371/journal.pone.0332931.g014

3.4.3. Visualization results analysis.

Figs 15 and 16 depict the recognition effects of the baseline and improved models on the mmWave heatmap dataset, respectively. As observed from the figures, the improved model enhances performance in recognizing all target scales, with particularly significant improvements in small target detection and localization precision. The dual validation through both visual comparisons and quantitative metric improvements (as shown in Fig 11–14) conclusively demonstrates that the proposed network achieves substantially enhanced robustness in addressing inherent multi-scale detection challenges of mmWave radar data.

Download:

Fig 15. Plot of baseline model output results.

https://doi.org/10.1371/journal.pone.0332931.g015

Download:

Fig 16. Plot of improved model output results.

https://doi.org/10.1371/journal.pone.0332931.g016

The confusion matrix provides a comprehensive evaluation of model performance across different target categories, with columns representing ground truth labels and rows indicating model predictions. This visualization effectively reveals both classification accuracy and common misclassification patterns for each target type. Figs 17 and 18 present the confusion matrices for the baseline and improved models respectively, enabling direct comparison of their recognition capabilities. The comparative analysis demonstrates the superior classification performance and the low misclassification rates of our enhanced model.

Download:

Fig 17. Diagram of the original model confusion matrix.

https://doi.org/10.1371/journal.pone.0332931.g017

Download:

Fig 18. Confusion matrix of the improved model.

https://doi.org/10.1371/journal.pone.0332931.g018

4. Conclusions

In this paper, we present an enhanced YOLOv8n model for multi-object detection in mmWave radar heatmaps. The proposed model integrates three key optimizations: (1) a redesigned backbone network to improve feature extraction, (2) DySample for advanced up-sampling, and (3) BiFPN for optimized multi-scale fusion. Extensive evaluations show significant improvements, achieving 93.1% mAP@0.5 and 55.8% mAP@0.5:0.95, with precision of 89.4% and recall of 90.6%, surpassing both the baseline YOLOv8n and comparison models. These advancements provide an effective framework for mmWave radar heatmap analysis, addressing challenges in low-resolution, multi-target detection, especially in challenging environments with dynamic lighting and complex backgrounds.

Due to time, environmental and other constraints, there are still several areas for improvement and further expansion of the research in this paper: (1) exploring attention-based radar-camera fusion to compensate for missing spatial details; (2) adaptive sparse training strategies will further enhance long-range small-target detection.(3) multi-object detection directly on millimeter-wave radar heatmaps still faces significant limitations. Radar heatmaps often suffer from low spatial resolution, severe noise, and object occlusion in dense scenes, making it difficult to distinguish overlapping targets. These challenges lead to degraded detection performance, particularly in scenarios with multiple closely spaced or weak-reflecting objects. Therefore, improving heatmap quality, enhancing instance-level feature separation, and incorporating cross-modal priors remain critical directions for future research.

References

1. Brookhuis KA, De Waard D, Janssen WH. Behavioural impacts of Advanced Driver Assistance Systems–an overview. European Journal of Transport and Infrastructure Research. 2019.
- View Article
- Google Scholar
2. Jiang T, Kang R, Li Q. BSM-NET: multi-bandwidth, multi-scale and multi-modal fusion network for 3D object detection of 4D radar and LiDAR. Measurement Science and Technology. 2025;36(3):036107.
- View Article
- Google Scholar
3. Yaqin Z, Yuqing S, Han W, Shengyang H, Puqiu L, Longwen W. High-precision Gesture Recognition Based on DenseNet and Convolutional Block Attention Module. Journal of Electronics and Information. 2024;46(3):967–76.
- View Article
- Google Scholar
4. Huang Y, Zhang H, Deng K, Yang Y, Zhang R uizhe, Liu J. A review of automotive millimeter wave radar signal processing techniques. Radar Journal. 2023;12(5):923–70.
- View Article
- Google Scholar
5. Gandhi PP, Kassam SA. Analysis of CFAR processors in nonhomogeneous background. IEEE Transactions on Aerospace and Electronic Systems. 1988;24(4):427–45.
- View Article
- Google Scholar
6. Xu X, Li Y, Yeh C, Zhao B, Ding W, Zhang Y. IGAMF: Adaptive CFAR Detection and Blind Speed Sidelobe Suppression for High-Speed Target in Homogeneous Environment. IEEE Transactions on Aerospace and Electronic Systems. 2025.
- View Article
- Google Scholar
7. Bourouz S, Baadeche M, Soltani F. Adaptive CFAR detection for MIMO radars in Pearson clutter. Signal, Image and Video Processing. 2025;19(4):1–7.
- View Article
- Google Scholar
8. Sim Y, Heo J, Jung Y, Lee S, Jung Y. Design of Reconfigurable Radar Signal Processor for Frequency-Modulated Continuous Wave Radar. IEEE Sensors J. 2025;25(7):11601–12.
- View Article
- Google Scholar
9. Bai W, Chen S, Ma J, Wang Y, Han C. Gesture Recognition with Residual LSTM Attention Using Millimeter-Wave Radar. Sensors (Basel). 2025;25(2):469. pmid:39860839
- View Article
- PubMed/NCBI
- Google Scholar
10. Luo Y, He Y, Li Y, Liu H, Wang J, Gao F. A Sliding Window-Based CNN-BiGRU Approach for Human Skeletal Pose Estimation Using mmWave Radar. Sensors (Basel). 2025;25(4):1070. pmid:40006298
- View Article
- PubMed/NCBI
- Google Scholar
11. Körner T, Batra A, Kaiser T, Pohl N, Schulz C, Rolfes I. Simultaneous Localization and Mapping (SLAM) for Room Exploration Using Ultrawideband Millimeterwave FMCW Radar. IEEE Journal of Microwaves. 2025.
- View Article
- Google Scholar
12. Schumann O, Hahn M, Dickmann J, Wöhler C. Semantic segmentation on radar point clouds. In: 2018.
- View Article
- Google Scholar
13. Danzer A, Griebel T, Bach M, Dietmayer K. 2D Car Detection in Radar Data with PointNets. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019. 61–6.
- View Article
- Google Scholar
14. Gopireddy Palguna KR, Arun Kumar G, Ram G, Farukh Hashmi M. Lw-PSCNN: Lightweight Pointwise-Separable Convolution Neural Network for ISAR Image Classification. IEEE Trans Instrum Meas. 2025;74:1–8.
- View Article
- Google Scholar
15. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8(1):53. pmid:33816053
- View Article
- PubMed/NCBI
- Google Scholar
16. Jiang W, Ren Y, Liu Y, Leng J. A method of radar target detection based on convolutional neural network. Neural Computing and Applications. 2021;33(16):9835–47.
- View Article
- Google Scholar
17. Gupta S, Rai PK, Kumar A, Yalavarthy PK, Cenkeramaddi LR. Target Classification by mmWave FMCW Radars Using Machine Learning on Range-Angle Images. IEEE Sensors J. 2021;21(18):19993–20001.
- View Article
- Google Scholar
18. Lamane M, Tabaa M, Klilou A. New Approach Based on Pix2Pix-YOLOv7 mmWave Radar for Target Detection and Classification. Sensors (Basel). 2023;23(23):9456. pmid:38067828
- View Article
- PubMed/NCBI
- Google Scholar
19. Kosuge A, Suehiro S, Hamada M, Kuroda T. mmWave-YOLO: A mmWave Imaging Radar-Based Real-Time Multiclass Object Recognition System for ADAS Applications. IEEE Trans Instrum Meas. 2022;71:1–10.
- View Article
- Google Scholar
20. Kim J-C, Jeong H-G, Lee S. Simultaneous Target Classification and Moving Direction Estimation in Millimeter-Wave Radar System. Sensors (Basel). 2021;21(15):5228. pmid:34372465
- View Article
- PubMed/NCBI
- Google Scholar
21. Raimondi M, Ciattaglia G, Nocera A, Senigagliesi L, Spinsante S, Gambi E. mmDetect: YOLO-Based Processing of mm-Wave Radar Data for Detecting Moving People. IEEE Sensors J. 2024;24(7):11906–16.
- View Article
- Google Scholar
22. Zhang A, Nowruzi FE, Laganiere R. Raddet: Range-azimuth-doppler based radar object detection for dynamic road users. In: 2021.
- View Article
- Google Scholar
23. Tao Z, Ngui WK. A Review of Automatic Driving Target Detection Based on Camera and Millimeter Wave Radar Fusion Technology. Int J Automot Mech Eng. 2025;22(1):11965–85.
- View Article
- Google Scholar
24. Norkobil Saydirasulovich S, Abdusalomov A, Jamil MK, Nasimov R, Kozhamzharova D, Cho Y-I. A YOLOv6-Based Improved Fire Detection Approach for Smart City Environments. Sensors (Basel). 2023;23(6):3161. pmid:36991872
- View Article
- PubMed/NCBI
- Google Scholar
25. Liu W, Lu H, Fu H, Cao Z. Learning to upsample by learning to sample. In: 2023.
- View Article
- Google Scholar
26. Gupta S, Rai PK, Kumar A, Yalavarthy PK, Cenkeramaddi LR. Target Classification by mmWave FMCW Radars Using Machine Learning on Range-Angle Images. IEEE Sensors J. 2021;21(18):19993–20001.
- View Article
- Google Scholar
27. Wang J, Chen K, Xu R, Liu Z, Loy CC, Lin D. Carafe: Content-aware reassembly of features. In: 2019.
- View Article
- Google Scholar
28. Lu H, Liu W, Fu H, Cao Z. FADE: A Task-Agnostic Upsampling Operator for Encoder–Decoder Architectures. Int J Comput Vis. 2024;133(1):151–72.
- View Article
- Google Scholar
29. Lu H, Liu W, Ye Z, Fu H, Liu Y, Cao Z. SAPA: Similarity-aware point affiliation for feature upsampling. Advances in Neural Information Processing Systems. 2022;35:20889–901.
- View Article
- Google Scholar
30. Li YF, Lan HY, Xue JF, Guo JL, Huang RJ, Zhu JP. An algorithm for detecting dental caries in children based on multi-scale path aggregation. China Laser. 2024;51(15):1507207.
- View Article
- Google Scholar
31. Zhao Y, Lv W, Xu S, Wei J, Wang G, Dang Q. Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024.
- View Article
- Google Scholar
32. Zhang Y, Zhou B, Zhao X, Song X. Enhanced object detection in low-visibility haze conditions with YOLOv9s. PLoS One. 2025;20(2):e0317852. pmid:39993001
- View Article
- PubMed/NCBI
- Google Scholar
33. Khanam R, Hussain M. Yolov11: An overview of the key architectural enhancements. arXiv preprint. 2024.
- View Article
- Google Scholar

[ref1] 1. Brookhuis KA, De Waard D, Janssen WH. Behavioural impacts of Advanced Driver Assistance Systems–an overview. European Journal of Transport and Infrastructure Research. 2019.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Jiang T, Kang R, Li Q. BSM-NET: multi-bandwidth, multi-scale and multi-modal fusion network for 3D object detection of 4D radar and LiDAR. Measurement Science and Technology. 2025;36(3):036107.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Yaqin Z, Yuqing S, Han W, Shengyang H, Puqiu L, Longwen W. High-precision Gesture Recognition Based on DenseNet and Convolutional Block Attention Module. Journal of Electronics and Information. 2024;46(3):967–76.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Huang Y, Zhang H, Deng K, Yang Y, Zhang R uizhe, Liu J. A review of automotive millimeter wave radar signal processing techniques. Radar Journal. 2023;12(5):923–70.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Gandhi PP, Kassam SA. Analysis of CFAR processors in nonhomogeneous background. IEEE Transactions on Aerospace and Electronic Systems. 1988;24(4):427–45.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Xu X, Li Y, Yeh C, Zhao B, Ding W, Zhang Y. IGAMF: Adaptive CFAR Detection and Blind Speed Sidelobe Suppression for High-Speed Target in Homogeneous Environment. IEEE Transactions on Aerospace and Electronic Systems. 2025.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Bourouz S, Baadeche M, Soltani F. Adaptive CFAR detection for MIMO radars in Pearson clutter. Signal, Image and Video Processing. 2025;19(4):1–7.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Sim Y, Heo J, Jung Y, Lee S, Jung Y. Design of Reconfigurable Radar Signal Processor for Frequency-Modulated Continuous Wave Radar. IEEE Sensors J. 2025;25(7):11601–12.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Bai W, Chen S, Ma J, Wang Y, Han C. Gesture Recognition with Residual LSTM Attention Using Millimeter-Wave Radar. Sensors (Basel). 2025;25(2):469. pmid:39860839
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref10] 10. Luo Y, He Y, Li Y, Liu H, Wang J, Gao F. A Sliding Window-Based CNN-BiGRU Approach for Human Skeletal Pose Estimation Using mmWave Radar. Sensors (Basel). 2025;25(4):1070. pmid:40006298
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref11] 11. Körner T, Batra A, Kaiser T, Pohl N, Schulz C, Rolfes I. Simultaneous Localization and Mapping (SLAM) for Room Exploration Using Ultrawideband Millimeterwave FMCW Radar. IEEE Journal of Microwaves. 2025.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref12] 12. Schumann O, Hahn M, Dickmann J, Wöhler C. Semantic segmentation on radar point clouds. In: 2018.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref13] 13. Danzer A, Griebel T, Bach M, Dietmayer K. 2D Car Detection in Radar Data with PointNets. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019. 61–6.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref14] 14. Gopireddy Palguna KR, Arun Kumar G, Ram G, Farukh Hashmi M. Lw-PSCNN: Lightweight Pointwise-Separable Convolution Neural Network for ISAR Image Classification. IEEE Trans Instrum Meas. 2025;74:1–8.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref15] 15. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8(1):53. pmid:33816053
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref16] 16. Jiang W, Ren Y, Liu Y, Leng J. A method of radar target detection based on convolutional neural network. Neural Computing and Applications. 2021;33(16):9835–47.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref17] 17. Gupta S, Rai PK, Kumar A, Yalavarthy PK, Cenkeramaddi LR. Target Classification by mmWave FMCW Radars Using Machine Learning on Range-Angle Images. IEEE Sensors J. 2021;21(18):19993–20001.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref18] 18. Lamane M, Tabaa M, Klilou A. New Approach Based on Pix2Pix-YOLOv7 mmWave Radar for Target Detection and Classification. Sensors (Basel). 2023;23(23):9456. pmid:38067828
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref19] 19. Kosuge A, Suehiro S, Hamada M, Kuroda T. mmWave-YOLO: A mmWave Imaging Radar-Based Real-Time Multiclass Object Recognition System for ADAS Applications. IEEE Trans Instrum Meas. 2022;71:1–10.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref20] 20. Kim J-C, Jeong H-G, Lee S. Simultaneous Target Classification and Moving Direction Estimation in Millimeter-Wave Radar System. Sensors (Basel). 2021;21(15):5228. pmid:34372465
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref21] 21. Raimondi M, Ciattaglia G, Nocera A, Senigagliesi L, Spinsante S, Gambi E. mmDetect: YOLO-Based Processing of mm-Wave Radar Data for Detecting Moving People. IEEE Sensors J. 2024;24(7):11906–16.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref22] 22. Zhang A, Nowruzi FE, Laganiere R. Raddet: Range-azimuth-doppler based radar object detection for dynamic road users. In: 2021.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref23] 23. Tao Z, Ngui WK. A Review of Automatic Driving Target Detection Based on Camera and Millimeter Wave Radar Fusion Technology. Int J Automot Mech Eng. 2025;22(1):11965–85.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref24] 24. Norkobil Saydirasulovich S, Abdusalomov A, Jamil MK, Nasimov R, Kozhamzharova D, Cho Y-I. A YOLOv6-Based Improved Fire Detection Approach for Smart City Environments. Sensors (Basel). 2023;23(6):3161. pmid:36991872
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref25] 25. Liu W, Lu H, Fu H, Cao Z. Learning to upsample by learning to sample. In: 2023.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref26] 26. Gupta S, Rai PK, Kumar A, Yalavarthy PK, Cenkeramaddi LR. Target Classification by mmWave FMCW Radars Using Machine Learning on Range-Angle Images. IEEE Sensors J. 2021;21(18):19993–20001.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref27] 27. Wang J, Chen K, Xu R, Liu Z, Loy CC, Lin D. Carafe: Content-aware reassembly of features. In: 2019.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref28] 28. Lu H, Liu W, Fu H, Cao Z. FADE: A Task-Agnostic Upsampling Operator for Encoder–Decoder Architectures. Int J Comput Vis. 2024;133(1):151–72.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref29] 29. Lu H, Liu W, Ye Z, Fu H, Liu Y, Cao Z. SAPA: Similarity-aware point affiliation for feature upsampling. Advances in Neural Information Processing Systems. 2022;35:20889–901.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref30] 30. Li YF, Lan HY, Xue JF, Guo JL, Huang RJ, Zhu JP. An algorithm for detecting dental caries in children based on multi-scale path aggregation. China Laser. 2024;51(15):1507207.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref31] 31. Zhao Y, Lv W, Xu S, Wei J, Wang G, Dang Q. Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref32] 32. Zhang Y, Zhou B, Zhao X, Song X. Enhanced object detection in low-visibility haze conditions with YOLOv9s. PLoS One. 2025;20(2):e0317852. pmid:39993001
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref33] 33. Khanam R, Hussain M. Yolov11: An overview of the key architectural enhancements. arXiv preprint. 2024.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

Figures

Abstract

1. Introduction

2. Materials and methods

2.1. Introduction to YOLOv8n model

2.2. The S2DB-mmWave YOLOv8n methodology

2.2.1 Backbone network improvements.

2.2.2. Up-sampling technique.

2.2.3. Weighted bidirectional feature pyramid networks.

3. Experiment

3.1. The dataset

3.2. Experimental environment and configuration

3.3. Model evaluation metrics

3.4. Experimental results and analysis

3.4.1. Comparison with different models.

3.4.2. Ablation experiments.

3.4.3. Visualization results analysis.

4. Conclusions

References