Lightweight helmet target detection algorithm combined with Effici-Bi-Level Routing Attention

Yanguo Huang; Minjie Fang; Jian Peng

doi:10.1371/journal.pone.0303866

Abstract

Wearing helmets is essential in two-wheeler traffic to reduce the incidence of injuries caused by accidents. We present FB-YOLOv7, an improved detection network based on the YOLOv7-tiny model. The objective of this network is to tackle the problems of both missed detection and false detection that result from the difficulties in identifying small targets and the constraints in equipment performance during helmet detection. By applying an enhanced Bi-Level Routing Attention, the network can improve its capacity to extract global characteristics and reduce information distortion. Furthermore, we deploy the AFPN framework and effectively resolve information conflict using asymptotic adaptive feature fusion technology. Incorporating the EfficiCIoU loss significantly improves the prediction box’s accuracy. Experimental trials done on specific datasets reveal that FB-YOLOv7 attains an accuracy of 87.2% and 94.6% on the mean average precision (mAP_@.5). Additionally, it maintains a high level of efficiency with frame rates of 129 and 126 frames per second (FPS). FB-YOLOv7 surpasses the other six widely-used detection networks in terms of detection accuracy, network implementation requirements, sensitivity in detecting small targets, and potential for practical applications.

Citation: Huang Y, Fang M, Peng J (2024) Lightweight helmet target detection algorithm combined with Effici-Bi-Level Routing Attention. PLoS ONE 19(5): e0303866. https://doi.org/10.1371/journal.pone.0303866

Editor: Manob Jyoti Saikia, University of North Florida, UNITED STATES

Received: January 9, 2024; Accepted: May 1, 2024; Published: May 29, 2024

Copyright: © 2024 Huang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data set underlying this article has been uploaded to Figshare and is accessible via. https://figshare.com/s/6c991ed6d4ace7216f0b.

Funding: This work was supported by the National Natural Science Foundation of China (No. 72061016) and the Innovation Fund of Jiangxi (YC2023-S654). We thank them for their support for this study. Special thanks are given to Yanguo Huang for their contributions in research design, data collection and report writing, as well as Jian Peng for their assistance in data set analysis and paper submission.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Recently, due to the swift advancement of new energy technology, electric bicycles and other two-wheeled vehicles have gained popularity among individuals. This is mostly due to their energy efficiency, environmental friendliness, and ease, making them the preferred mode of transportation for both personal travel and distribution services. Nevertheless, as the frequency of use increases, traffic congestion and accidents are also increasing. In China, the absence of organised driving instruction and evaluation has led to a lack of knowledge among cyclists regarding traffic safety. Consequently, the mortality rate for bicycle and electric bicycle accidents has risen from 4.86% in 2000 to 6.97% in 2019 [1]. Meanwhile, in the United States, the fatality rate for motorcyclists in traffic accidents is 28 times higher than that of passengers in cars, a statistic that will reach concerning levels in 2020 [2]. Brain injury is the primary cause of the majority of these accidents are caused by brain injuries. Research has demonstrated that wearing a helmet properly can decrease the likelihood of a head injury by 60% and the death rate by 71% [3]. Hence, the surveillance of helmet usage plays a crucial role in mitigating fatalities from road accidents and enhancing drivers’ consciousness of safety, making it an imperative issue that cannot be disregarded.

Computer vision and machine learning have made substantial advancements in road traffic analysis in recent years. Algorithms based on traditional methods and deep learning primarily categorize the approaches for target detection. People commonly employ the deep learning technique because of its effective feature extraction and high detection rate. The target detection algorithm has evolved into two categories: One-Stage and Two-Stage. The former, such as the YOLO [4] series and SSD [5], have a fast detection rate and are suited for meeting the demands of road traffic. The latter, such as R-CNN [6] and Fast R-CNN [7], have accurate recognition but slower speed and a more complex network. Consequently, the One-Stage approach is often preferred for road traffic detection.

Currently, the majority of scholars researching helmet detection employ the One-Stage method and seek to optimize it. Wu et al. [8] improved detection performance by replacing the YOLO v3 [9] backbone with Densenet [10]. Jin et al. [11] modified the output of the YOLOv4 [12] feature map to 4, added a 128 × 128 feature map output, and improved the feature fusion module to achieve feature reuse, thus obtaining better classification results. Xue et al. [13] enhanced the quality of retrieved features by integrating channel and spatial attention-weighted features with dense connection networks. On the other hand, Jia et al. [14] substantially increased the accuracy of the model by including the attention mechanism in the YOLOv5 method and utilizing the Soft-NMS [15] algorithm. In 2021, Lv [16] used the CenterNet [17] algorithm with the HOI (Human Object Interaction) to achieve real-time and precise detection of motorcyclists’ safety helmet usage through comprehensive labeling.

Despite the present optimisation strategy enhancing the accuracy of helmet detection, there remain unresolved issues. Due to its small size, the helmet will cause accidents of missed detection and false detection due to the influence of many targets, occlusion, illumination, angle and other factors in the monitoring screen. Furthermore, given the constrained capabilities of edge terminal devices, it is crucial to strike a balance between the precision of detection and the processing resources required for helmet detection tasks.

To address the aforementioned issues, we suggest the implementation of the FB-YOLOv7 network, which builds upon the enhancements made to YOLOv7-tiny. This paper’s main contributions are as follows:

A lightweight FB-YOLOv7 network is proposed, which combines the One-Stage algorithm with self-attention mechanism. It also adds the AFPN structure and optimises the loss function. This network’s purpose is to identify helmet use and vehicles in difficult road environments.
In order to save computing resources, E-BRA is proposed to improve global search efficiency by filtering out regions with low correlation.

The subsequent sections of this article are organised in the following manner: “Section 2: Method”provides a comprehensive explanation of the network presented in this paper. The empirical findings are presented in “Section 3: Experiment Results”. And draw a conclusion in “Section 4: Conclusion”.

2 Method

2.1 YOLOv7-tiny

YOLOv7 is a target detection network model introduced by Wang et al. in 2022 [18]. Within the FPS range of 5 to 160, the YOLOv7 network demonstrates significant superiority in terms of speed and accuracy compared to existing One-Stage algorithms. Specifically designed for edge terminal devices, YOLOv7-tiny is a network model. It is based on YOLOv7 and consists of three primary components: Backbone, Neck, and Head. Fig 1 displays the structure.

Download:

Fig 1. YOLOv7-tiny network structure.

https://doi.org/10.1371/journal.pone.0303866.g001

In the Backbone section, a more compact ELAN is utilised instead of an E-ELAN for feature extraction, while MPConv is maintained for downsampling. In the Neck section, the features processed by SPPCSPC are combined using the PANet [19] structure. Lastly, the Head section employs the RepConv [20] module to enhance inference speed and generate prediction results of three distinct sizes.

2.2 FB-YOLOv7

This paper introduces the FB-YOLOv7 network, which aims to enhance the detection accuracy of helmet-wearing states and vehicles while minimising the chances of missing small targets. The network is an optimised version of YOLOv7-tiny, focusing on improving feature extraction, spatial feature fusion, and loss function.

The main improvement method is to add the E-BRA module to the process of extracting features. By excluding the low correlation zone, attention is focused on the high correlation region, allowing for accurate feature extraction while minimising computational resources. The feature pyramid has been improved to AFPN, and asymptotic adaptive spatial feature fusion is employed to preserve the information of low-level features and minimise the potential for information conflict. We have modified the loss function to ECIoU to enhance the network’s resilience and responsiveness towards small targets. In the subsequent sections, we will provide a comprehensive explanation of each aspect of improvement. Fig 2 illustrates FB-YOLOv7 network structure.

Download:

Fig 2. FB-YOLOv7 network structure.

https://doi.org/10.1371/journal.pone.0303866.g002

2.2.1 Effici-Bi-Level Routing Attention.

In the practical implementation of helmet detection, the targets on the monitoring image are typically small and closely packed, making them vulnerable to intricate road conditions and weather variations. This study incorporates a self-attention technique to enhance the feature extraction capability of the model and minimise the rate of missed target detection. The self-attention mechanism enhances the network’s performance and efficiency by capturing the correlation between distinct points in the sequence. This allows the network to prioritise the most significant or relevant elements of the image. Nevertheless, the current Swin Transformer [21], ViT [22], CvT [23], and other models suffer from issues such as extensive computational requirements and high memory usage. Therefore, this study introduces an enhanced sparse self-attention mechanism module known as E-BRA (Effici-Bi-Level Routing Attention), based on the BRA (Bi-Level Routing Attention) [24].

The E-BRA primarily consists of three distinct sections, as illustrated in Fig 3. The initial component involves the partition and input projection within the specified region. The implementation involves dividing the input feature map X ∈ R^H×W×C into S × S non-overlapping sections and obtaining by linear mapping. (1) (2) (3) Where W^q, W^k and W^v are the projection weights belonging to query, key, and value, respectively.

Download:

Fig 3. E-BRA structure.

https://doi.org/10.1371/journal.pone.0303866.g003

The second component involves the region-to-region routing of the graph. The correlation between the two regions can be determined by multiplying the values of Q and K. Similarly, one can construct the correlation matrix, denoted as A^r that shows the relationship between different regions.

(4)

In this context, Q^r and K^r represent the matrices that include the average values of the query and key in each region, respectively.

The correlation matrix A^r depicts the interrelationship between regions in the feature map. To obtain the routing index matrix I^r, it is necessary to eliminate the portion with low correlation. I^r reflects the most concentrated area following the screening process. (5) (6) where k is the number of regions in I^r and λ_r is the correlation threshold.

The third part is Token-to-Token attention. (7) (8) (9) where gather() refers to pooling and concatenating all the corresponding tensors in the routing index matrix, and LE() is the local enhancement of V by the deep convolutional network.

As depicted in Fig 4, the Transformer uses the complete feature map as its input, leading to the utilisation of a significant amount of computational resources. BRA selectively removes regions that are unrelated to target detection, hence enhancing the efficiency of feature extraction. However, there is a potential risk of eliminating desirable feature regions or preserving low-value feature regions. E-BRA differs from BRA in that it replaces the fixed value of I^r capacity with a dynamic variable that varies with λ_r. This allows for λ_r adjustment to maintain a balance between computational resources and feature extraction regions.

Download:

Fig 4. Transformer, BRA, E-BRA feature extraction area.

https://doi.org/10.1371/journal.pone.0303866.g004

To assess the efficacy of the E-BRA module suggested in this paper, the YOLOv7-tiny model is employed to evaluate its merits and drawbacks. Table 1 displays the outcomes of the experiment.

Download:

Table 1. The parameters of the improved model are compared in task performance.

https://doi.org/10.1371/journal.pone.0303866.t001

According to the data presented in Table 1, the enhanced E-BRA outperforms BRA in terms of both accuracy and F1 score. Furthermore, there has been a substantial enhancement in detection efficiency. The results provide clear evidence of the upgraded E-BRA’s effectiveness in fulfilling the reduced computational resource needs of edge terminal devices.

2.2.2 Asymptotic feature pyramid network.

The tiny size of helmets in most photos makes them easy to overlook when using high-level features in helmet recognition applications. While the YOLOv7-tiny model can utilize all feature layers in a fixed manner through PAnet, this approach has drawbacks. It consumes significant computational resources and can lead to suboptimal results due to information loss during transmission. In order to solve this problem, this paper introduces AFPN [25] to realize the fusion of different levels of features.

Fig 5 illustrates the main implementation of AFPN through the introduction of an ASF module, which adaptively fuses different levels of features through weighted average and alignment. The fusion formula is as follows: (10) where , and denote the spatial weights of the features in layer l and the constraints are .

Download:

Fig 5. AFPN structure.

https://doi.org/10.1371/journal.pone.0303866.g005

The process in AFPN involves the fusion of adjacent features at a low level, followed by the fusion of the resulting features with higher-level features. This fusion method does not directly combine features with a significant disparity in size, hence addressing the semantic gap that exists between non-adjacent layers. This novel progressive fusion technique circumvents the notable disparities in feature fusion across various sizes, efficiently harnesses multi-scale feature data, and preserves the characteristics of each level.

2.2.3 Loss function improvement.

At the moment, YOLOv7-tiny typically makes use of the CIoU [26] loss function. Because CIoU considers the bounding box regression’s overlap area, centre point distance, and aspect ratio, its boundary regression loss computation is more precise. The CIoU loss function has the following expression: (11) (12) (13) where ρ²(b, b^gt) denotes the Euclidean distance between the predicted frame and the center point of the real frame, c denotes the diagonal distance at the smallest closed region that can contain both the predicted and real frames, α enotes the weight coefficient, and v measures the consistency of the aspect ratio between the predicted frame and the real frame, ω^gt is the width of the true frame, h^gt is the height of the true frame, ω is the width of the predicted frame, and h is the height of the predicted frame.

Nevertheless, the CIoU loss function possesses two inherent defects that must not be overlooked. The value domain of the inverse tangent function in the formula for calculating the penalty term in the CIoU loss function is limited to the range of (0, π/2). However, this conflicts with the requirement for numerical normalization. To address this issue, it becomes necessary to introduce new coefficients to achieve normalization, which in turn increases the computational complexity. Furthermore, the penalty term exhibits excessive sensitivity to abnormal situations, leading to a diminished robustness of the penalty term and more pronounced oscillations in the loss value. These two characteristics are particularly noticeable in edge terminal devices that lack significant computing power resources.

This research proposes the adoption of the ECIoU [27] loss function as a more efficient and direct alternative to address the aforementioned issues and compensate for the shortcomings of the CIoU loss function. The ECIoU loss function can be expressed as: (14) (15) (16)

The primary enhancement concept of the ECIoU loss function is to represent the aspect ratio of the actual frame as the domain of the sigmoid function and optimize the penalty term of the loss function using function-based thinking. The penalty term produced in this manner has a value domain of (0, 0.25), which aligns more closely with the requirements of numerical normalization compared to the penalty term that does not account for any loss of the original information. Furthermore, as a result of the characteristics of the sigmoid function, the penalty term θ is more resilient and exhibits a more gradual change compared to the penalized term. Hence, the utilization of the ECIoU regression loss function leads to accelerated convergence, improved localization outcomes, enhanced model performance, and heightened sensitivity towards smaller targets.

The comparison graph of ECIoU and CIoU, shown in Fig 6, clearly demonstrates that the ECIoU curve has a smoother and more consistent trajectory. Furthermore, it reliably generates outputs with lower losses throughout numerous iterations. This highlights the effectiveness of the penalty term in the optimization function of ECIoU.

Download:

Fig 6.

(a) Box loss comparison of loss function. (b) Cls loss comparison of loss function.

https://doi.org/10.1371/journal.pone.0303866.g006

3 Experiment results

3.1 Experimental environment

To assure the accuracy of the model, all tests in this study are conducted using identical hardware and software configurations, ensuring that the results are solely attributable to the model itself. The studies were performed on a Windows 11 system environment using an NVIDIA GeForce RTX 3080 Laptop GPU with 16GB of video RAM. The Pycharm software used in this experiment was configured with the following environments: pytorch 2.0.1, python 3.11, and CUDA 11.7.

The experimental training parameters are set as follows: the initial learning rate is 0.0001, the Batch _ size is set to 4, the Image size is 640, the Adam optimizer is selected for optimization, the weight attenuation coefficient is 0.0005, the epoch is 300, and the learning rate momentum parameter is 0.94. The Warmup method is trained, and one-dimensional linear interpolation is used to update the learning rate. After Warmup, the cosine annealing algorithm is used to update the learning rate.

3.1.1 Datasets.

This paper focuses on two datasets that involve individuals wearing helmets. One of the datasets is the Helmet detection dataset from Roboflow Universe. It primarily consists of images of bicycles and other two-wheeled vehicles. The dataset contains a total of 4,311 images, which have been re-labelled into five categories: With Helmet, Without Helmet, Motorcycle, Bicycle, and Electric ikes. We have obtained the Daylight-v1 dataset from Roboflow Universe as an additional dataset. The dataset primarily consists of 4,374 images related to driving electric vehicles. These images are categorised into three tags: With Helmet, Without Helmet, and Vehicles. The two data sets are partitioned into a training set, a validation set, and a test set in an 8:1:1 ratio.

3.1.2 Model evaluation.

In this paper, common metrics like accuracy rate (precision), recall rate (recall), F1 score, and average accuracy mean value (mAP) are used to objectively compare the model’s pros and cons in detecting the effect of helmet wearing. (17) (18) (19) Where TP represents the number of correctly predicted positive samples, FP represents the number of incorrectly predicted negative samples, and FN represents the number of incorrectly predicted positive samples.

In this paper, mAP_@.5 and mAP_@.5:95 are selected as the evaluation metrics. mAP_@.5 is the value when the threshold is taken as 0.5, and mAP_@.5:95 is the average of all the values obtained when the threshold is taken as 0.05 steps from 0.5 to 0.95.

The assessment measures chosen for this paper are mAP_@.5 and mAP_@.5:95. The mAP_@.5 refers to the metric value obtained when the threshold is set at 0.5. On the other hand, mAP_@.5:95 represents the average of all the metric values obtained when the threshold is incremented by 0.05 steps from 0.5 to 0.95.

(20)

Furthermore, to thoroughly assess the performance of the comprehensive model, the quantity of model parameters and the transmission frame rate per second (FPS) are employed as evaluation metrics to gauge the intricacy of the model and the speed of detection, respectively.

3.2 Experimental results and analysis

3.2.1 Comparison of Baseline networks.

To assess the efficacy of FB-YOLOv7, this study initially evaluates FB-YOLOv7 in comparison with YOLOv7-tiny using the primary evaluation criteria. The outcomes are presented in Table 2, using a classification accuracy criterion of 0.5.

Download:

Table 2. The overall and classification accuracy of YOLOv7-tiny and FB-YOLOv7 on two datasets.

https://doi.org/10.1371/journal.pone.0303866.t002

Table 2 demonstrate that FB-YOLOv7 surpasses YOLOv7-tiny in terms of accuracy metrics on both datasets. FB-YOLOv7 demonstrates superior performance over YOLOv7-tiny in the mAP_@.5 metrics on the Daylight-v1 dataset, with a margin of 3.4%. In terms of individual classifications, the improvement in the motorbike classification is not substantial—only 0.9%. However, the helmet classification exhibits a significant improvement of 5.9%. In the Helmet detection dataset, FB-YOLOv7 shows a significant improvement in the more complex multi-class target scenario, with a substantial increase of 2.8% in mAP_@.5. In the single classification task, there is a noticeable improvement of 2% for motorcycles, 2.3% for bicycles, and 1.9% for electric bikes. Additionally, there are significant boosts of 3.5% and 4.3% for the accuracy of detecting helmets and without helmets, respectively, which were initially less accurate. To sum up, FB-YOLOv7 exhibits superior detection accuracy and excels at accurately identifying small targets.

3.2.2 Ablation experiment.

In this study, we conduct experiments using different modules and their combinations on both datasets to evaluate the effectiveness of the proposed enhancements. This allows for a comparison analysis. We employ the same settings during the training phase of all trials to guarantee the precision of the experiments. The outcomes are displayed in Table 3, where A represents E-BRA, B represents AFPN, and C represents ECIoU.

Download:

Table 3. Ablation experimental data.

https://doi.org/10.1371/journal.pone.0303866.t003

FB-YOLOv7 demonstrates a noteworthy enhancement in accuracy by implementing three new modifications to the Daylight-v1dataset, as indicated by its results. Out of all the modules, the E-BRA module has the most significant impact. It enhances the mAP_@.5 and F1 values by 2.9% and 4%, respectively; however, it does decrease the detection speed. This outcome demonstrates that the E-BRA module is capable of extracting picture characteristics with more efficiency, hence enhancing the network’s expressive capacity. Simultaneously, the AFPN module provides improvements of 0.4% and 3.3% and enhances the detection efficiency. This demonstrates the benefits of the asymptotic feature pyramid in preserving various layers of features. The enhancement of ECIoU-Loss leads to improved performance, resulting in a 0.3% increase in mAP_@.5 and a 2.3% increase in F1 values without adding further parameters, confirming its usefulness. The empirical findings from the combined implementation of these modules outperform those of a single module, indicating that these modules can operate harmoniously without any contradictions.

On the Helmet detection dataset, the accuracy and F1 value exhibit a modest reduction as the number of labels increases. However, Table 3’s information suggests that all three development suggestions still apply to this specific data set. The utilisation of the E-BRA module resulted in a 2.2% rise in mAP_@.5 and a 1.2% increase in F1 values. Similarly, the AFPN module led to a 0.5% increase in mAP_@.5 and a 0.4% increase in F1 values. The improvement in ECIoU-Loss also played a role in enhancing the results. The performance of a multi-module combination surpasses that of a single module, providing further evidence of the resilience of FB-YOLOv7, as observed in the helmet detection dataset. FB-YOLOv7 has demonstrated its capacity to adapt to intricate and variable real-world situations, regardless of whether they involve binary classification or multi-class classification.

To summarise, FB-YOLOv7 has three improvement points that effectively enhance accuracy and F1 value without any conflicts. This optimisation leads to improved overall performance of the model. Additionally, FB-YOLOv7 is highly adaptable and resilient, making it suitable for both two-class and multi-class classification tasks.

3.2.3 Mainstream model performance comparison.

To ascertain the superiority of FB-YOLOv7 over the existing mainstream detection models, namely Faster RCNN, YOLOv3, YOLOv5, YOLOv7, and YOLOv7-tiny, we have chosen these classical methods to perform tests on datasets for Helmet detection and Daylight-v1. The findings are displayed in Fig 7 and Table 4.

Download:

Fig 7. (a) Comparison of mainstream models in Helmet detection (b) Comparison of mainstream models in Daylight-v1.

https://doi.org/10.1371/journal.pone.0303866.g007

Download:

Table 4. Experimental data from the mainstream model on each of the two datasets.

https://doi.org/10.1371/journal.pone.0303866.t004

Comparing with YOLOv7-tiny, FB-YOLOv7 reduces the FPS by 35 but improves the AP_@.5 and F1 by 2.8% and 1.8%, respectively. When compared with YOLOv7, FB-YOLOv7 reduces the AP_@.5 by 1.1% but shows an improvement in the F1 and FPS by 1.1% and 18.1%, respectively. These results indicate that FB-YOLOv7 prioritises the balance between detection speed and accuracy, making it suitable for practical applications with significant practical significance. Furthermore, FB-YOLOv7 possesses comprehensive advantages when compared to other prevalent detection methods. FB-YOLOv7 shows significant improvements in AP_@.5, F1, and FPS compared to Faster RCNN, YOLOv3, and YOLOv5. Specifically, FB-YOLOv7’s AP_@.5 is improved by 38.7%, 25.8%, and 20.3%, respectively, while F1 is improved by 32.2%, 19.9%, and 16.1%, respectively. Additionally, FB-YOLOv7 achieves a substantial FPS improvement of 101, 70, and 37 compared to the aforementioned models. The aforementioned conclusions are equally applicable to the dataset used for Daylight-v1. It further demonstrates the superiority of FB-YOLOv7.

By conducting extensive algorithm comparison studies, we can deduce that FB-YOLOv7 demonstrates a substantial enhancement in detection speed while upholding a high level of accuracy. The balance of FB-YOLOv7 enables its effective deployment on edge devices with limited resources. Furthermore, FB-YOLOv7 exhibits significant benefits across several popular detection networks, excelling in both accuracy and speed. This demonstrates its robust potential and broad suitability in real-world application settings. The features of FB-YOLOv7 make it a dependable option for maintaining exceptional performance in many conditions.

3.2.4 Visual comparison.

Fig 8 shows the detection results of YOLOv3, YOLOv5, YOLOv7-tiny, and FB-YOLOv7 in two different situations to show how the improved algorithm works. Upon comparing (a), (b), and (c), it becomes evident that while the benefits of FB-YOLOv7 may not be apparent when detecting large targets that are easy to recognise, the enhanced loss function of FB-YOLOv7 demonstrates exceptional accuracy in recognising helmets that are difficult to identify. When comparing the original image with images (d), (e), and (f), it is evident that the original image contains a high density of small targets, which increases the likelihood of missed detection incidents. While YOLOv3 and YOLOv7-tiny also exhibit some degree of missed detection, FB-YOLOv7 did not miss any detections. FB-YOLOv7 greatly minimises the occurrence of missed detection incidents by utilising the screening and incorporation of global information by E-BRA and the preservation of low-level features by AFPN. To summarise, the enhanced algorithm greatly enhances the capacity to recognise small and obstructed targets while successfully reducing instances of missed detection and incorrect detection. This advancement offers enhanced and dependable technological assistance for a real-time monitoring system.

Download:

Fig 8. Visualization of detection results for different models.

https://doi.org/10.1371/journal.pone.0303866.g008

4 Conclusion

This paper introduces a novel network called FB-YOLOv7, designed to accurately recognise the driver’s helmet wearing status and vehicle type. It also addresses the challenge of detecting small targets in helmet detection for two-wheeled vehicles. The algorithm utilises the YOLOv7-tiny framework as its foundation. The original approach incorporates the E-BRA module, AFPN structure, and ECIoU loss function, resulting in a substantial enhancement in the capacity to capture global information and the sensitivity to detect small targets. This research rigorously evaluates the efficacy of FB-YOLOv7 by conducting tests on two datasets: helmet detection and bike helmet detection. Experimental results demonstrate that FB-YOLOv7 minimises the deployment prerequisites and fulfils the criteria for operating on edge terminal devices. Additionally, it exhibits a significant enhancement in both critical accuracy and F1 value, hence affirming its potential for practical applications. To summarise, FB-YOLOv7 has been demonstrated to be effective and can have a significant impact in the domain of helmet detection. However, the network still has ample opportunity for enhancement. In future endeavours, our primary objective should be to gather more extensive data in intricate settings, with a particular emphasis on expediting its detection rate and application in challenging and hostile environments. Additionally, we should strive to enhance network architecture and establish detection schemes with superior functionality and performance.

References

1. Li Y, Chen Q, Ma Q, et al. Injuries and risk factors associated with bicycle and electric bike use in China: A systematic review and meta-analysis[J]. Safety science, 2022, 152: 105769.
- View Article
- Google Scholar
2. Ganga A, Kim E J, Tang O Y, et al. The burden of unhelmeted motorcycle injury: A nationwide scoring-based analysis of helmet safety legislation[J]. Injury, 2023, 54(3): 848–856.
- View Article
- Google Scholar
3. Høye A. Bicycle helmets–To wear or not to wear? A meta-analyses of the effects of bicycle helmets on injuries[J]. Accident Analysis & Prevention, 2018, 117: 85–97.
- View Article
- Google Scholar
4. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.
5. Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21–37.
6. Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580–587.
7. Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.
8. Wu F, Jin G, Gao M, et al. Helmet detection based on improved YOLO V3 deep model[C]//2019 IEEE 16th International conference on networking, sensing and control (ICNSC). IEEE, 2019: 363–368.
9. Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
10. Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700–4708.
11. JIN Yu-fang , WU Xiang, DONG Hui, YU Li and ZHANG Wen-an. Improved YOLO v4 Algorithm for Safety Helmet Wearing Detection[J]. Computer Science, 2021, 48(11): 268–275.
- View Article
- Google Scholar
12. Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.
13. Ruichen Xue, HaoYuanyuan Zhang Zhen, HuangXunhua Lu Huali, ZhaoHua. Helmet wearing detection algorithm based on improved YOLOv3[J].Electronic Measurement Technology (in Chinese), 2021, 44(12): 115–120.
- View Article
- Google Scholar
14. Jia W, Xu S, Liang Z, et al. Real-time automatic helmet detection of motorcyclists in urban traffic using improved YOLOv5 detector[J]. IET Image Processing, 2021, 15(14): 3623–3637.
- View Article
- Google Scholar
15. Bodla N, Singh B, Chellappa R, et al. Soft-NMS—improving object detection with one line of code[C]//Proceedings of the IEEE international conference on computer vision. 2017: 5561–5569.
16. Jiajun Lv. Design and Implementation of Real-time Detection and Alarm System for Safety Helmet Based on Deep Learning [D]. Southwest University, 2021.
17. Duan K, Bai S, Xie L, et al. Centernet: Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6569–6578.
18. Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 7464–7475.
19. Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8759–8768.
20. Soudy M, Afify Y, Badr N. RepConv: A novel architecture for image scene classification on Intel scenes dataset[J]. International Journal of Intelligent Computing and Information Sciences, 2022, 22(2): 63–73.
- View Article
- Google Scholar
21. Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012–10022.
22. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
23. Wu H, Xiao B, Codella N, et al. Cvt: Introducing convolutions to vision transformers[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 22–31.
24. Zhu L, Wang X, Ke Z, et al. BiFormer: Vision Transformer with Bi-Level Routing Attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 10323–10333.
25. Yang G, Lei J, Zhu Z, et al. AFPN: Asymptotic Feature Pyramid Network for Object Detection[J]. arXiv preprint arXiv:2306.15988, 2023.
26. Zheng Z, Wang P, Ren D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE transactions on cybernetics, 2021, 52(8): 8574–8586.
- View Article
- Google Scholar
27. Yu J, Wu T, Zhang X, et al. An efficient lightweight SAR ship target detection network with improved regression loss function and enhanced feature information expression[J]. Sensors, 2022, 22(9): 3447.
- View Article
- Google Scholar

[ref1] 1. Li Y, Chen Q, Ma Q, et al. Injuries and risk factors associated with bicycle and electric bike use in China: A systematic review and meta-analysis[J]. Safety science, 2022, 152: 105769.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Ganga A, Kim E J, Tang O Y, et al. The burden of unhelmeted motorcycle injury: A nationwide scoring-based analysis of helmet safety legislation[J]. Injury, 2023, 54(3): 848–856.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Høye A. Bicycle helmets–To wear or not to wear? A meta-analyses of the effects of bicycle helmets on injuries[J]. Accident Analysis & Prevention, 2018, 117: 85–97.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.

[ref5] 5. Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21–37.

[ref6] 6. Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580–587.

[ref7] 7. Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.

[ref8] 8. Wu F, Jin G, Gao M, et al. Helmet detection based on improved YOLO V3 deep model[C]//2019 IEEE 16th International conference on networking, sensing and control (ICNSC). IEEE, 2019: 363–368.

[ref9] 9. Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.

[ref10] 10. Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700–4708.

[ref11] 11. JIN Yu-fang , WU Xiang, DONG Hui, YU Li and ZHANG Wen-an. Improved YOLO v4 Algorithm for Safety Helmet Wearing Detection[J]. Computer Science, 2021, 48(11): 268–275.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref12] 12. Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.

[ref13] 13. Ruichen Xue, HaoYuanyuan Zhang Zhen, HuangXunhua Lu Huali, ZhaoHua. Helmet wearing detection algorithm based on improved YOLOv3[J].Electronic Measurement Technology (in Chinese), 2021, 44(12): 115–120.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref14] 14. Jia W, Xu S, Liang Z, et al. Real-time automatic helmet detection of motorcyclists in urban traffic using improved YOLOv5 detector[J]. IET Image Processing, 2021, 15(14): 3623–3637.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref15] 15. Bodla N, Singh B, Chellappa R, et al. Soft-NMS—improving object detection with one line of code[C]//Proceedings of the IEEE international conference on computer vision. 2017: 5561–5569.

[ref16] 16. Jiajun Lv. Design and Implementation of Real-time Detection and Alarm System for Safety Helmet Based on Deep Learning [D]. Southwest University, 2021.

[ref17] 17. Duan K, Bai S, Xie L, et al. Centernet: Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6569–6578.

[ref18] 18. Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 7464–7475.

[ref19] 19. Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8759–8768.

[ref20] 20. Soudy M, Afify Y, Badr N. RepConv: A novel architecture for image scene classification on Intel scenes dataset[J]. International Journal of Intelligent Computing and Information Sciences, 2022, 22(2): 63–73.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref21] 21. Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012–10022.

[ref22] 22. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

[ref23] 23. Wu H, Xiao B, Codella N, et al. Cvt: Introducing convolutions to vision transformers[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 22–31.

[ref24] 24. Zhu L, Wang X, Ke Z, et al. BiFormer: Vision Transformer with Bi-Level Routing Attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 10323–10333.

[ref25] 25. Yang G, Lei J, Zhu Z, et al. AFPN: Asymptotic Feature Pyramid Network for Object Detection[J]. arXiv preprint arXiv:2306.15988, 2023.

[ref26] 26. Zheng Z, Wang P, Ren D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE transactions on cybernetics, 2021, 52(8): 8574–8586.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref27] 27. Yu J, Wu T, Zhang X, et al. An efficient lightweight SAR ship target detection network with improved regression loss function and enhanced feature information expression[J]. Sensors, 2022, 22(9): 3447.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

Figures

Abstract

1 Introduction

2 Method

2.1 YOLOv7-tiny

2.2 FB-YOLOv7

2.2.1 Effici-Bi-Level Routing Attention.

2.2.2 Asymptotic feature pyramid network.

2.2.3 Loss function improvement.

3 Experiment results

3.1 Experimental environment

3.1.1 Datasets.

3.1.2 Model evaluation.

3.2 Experimental results and analysis

3.2.1 Comparison of Baseline networks.

3.2.2 Ablation experiment.

3.2.3 Mainstream model performance comparison.

3.2.4 Visual comparison.

4 Conclusion

References