Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Implementation of resource-efficient fetal echocardiography detection algorithms in edge computing

Abstract

Recent breakthroughs in medical AI have proven the effectiveness of deep learning in fetal echocardiography. However, the limited processing power of edge devices hinders real-time clinical application. We aim to pioneer the future of intelligent echocardiography equipment by enabling real-time recognition and tracking in fetal echocardiography, ultimately assisting medical professionals in their practice. Our study presents the YOLOv5s_emn (Extremely Mini Network) Series, a collection of resource-efficient algorithms for fetal echocardiography detection. Built on the YOLOv5s architecture, these models, through backbone substitution, pruning, and inference optimization, while maintaining high accuracy, the models achieve a significant reduction in size and number of parameters, amounting to only 5%-19% of YOLOv5s. Tested on the NVIDIA Jetson Nano, the YOLOv5s_emn Series demonstrated superior inference speed, being 52.8–125.0 milliseconds per frame(ms/f) faster than YOLOv5s, showcasing their potential for efficient real-time detection in embedded systems.

Introduction

Echocardiography has significant advantages such as portability, the absence of ionizing radiation, high temporal resolution, and low costs. Accurate and reliable echocardiographic assessment is the premise of high-quality clinical decision-making [1]. Fetal echocardiography, for example, has been widely used. It can accurately diagnose value of echocardiography for fetal congenitally unguarded tricuspid valve orifice [2] and has positive effects on reducing maternal anxiety [3]. Over the past decade, there has been an increasing application of artificial intelligence technology in cardiovascular imaging, including echocardiography, as it may reduce medical costs and avoid unnecessary tests [4]. It may be increasingly advantageous to introduce artificial intelligence technology for the identification of echocardiography images [5, 6]. However, there is a relative lack of research on its application in fetal echocardiographic image recognition [7, 8]. Fetal Echocardiography images have lower resolution and signal-to-noise ratio compared to ordinary echocardiography images, resulting in poor image quality and blurred object edges. Only internal echo signals of objects can be obtained, making it difficult to directly recognize objects. In addition, the fetus can freely rotate and roll over in the uterus, making its posture and position difficult to predict, which increases the difficulty of recognition. Accuracy is highly dependent on physician experience, making it difficult to adopt in primary care hospitals.

Traditional machine learning, an early branch of artificial intelligence, saw rapid advancement through algorithms like decision trees, which seek to minimize information entropy based on information theory; support vector machines, rooted in statistical learning; Bayesian classifiers, derived from Bayesian decision theory; and ensemble learning, which merges multiple learners to tackle tasks. Despite achieving notable results in specific applications, these traditional algorithms face challenges in universality and intelligence. In recent years, neural networks have emerged as a groundbreaking development in the field of machine learning. The first artificial neuron model, the MCP neuron, was conceptualized in 1943 by Warren McCulloch and Walter Pitts. They examined the connectivity of human brain neurons to develop this model. Connectionism, underpinned by neural networks, became a dominant technique until 1986. In that year, D.E. Rumelhart and others [9], invented the backpropagation algorithm, enabling neural networks to address a multitude of practical issues effectively. In 2006, Geoffrey Hinton unveiled deep belief networks in his paper titled "A fast learning algorithm for deep belief net" [10]. This development addressed the training challenges of deep neural networks and spurred swift progress in neural network algorithms. In 2009, Professor Li Fei-Fei of Stanford University initiated the ImageNet dataset [11], facilitating the development of seminal convolutional neural networks such as AlexNet [12], GoogLeNet [13], VGGNet [14], ResNet [15], and SENet [16] over the subsequent eight years, thus significantly propelling the advancement of deep learning technology. Deep learning models have demonstrated remarkably diverse and impressive performance in various subfields of artificial intelligence, such as sustainable urban development, the evolution of broadcasting, and the advancement of radio technology [1719]. The integration of these AI technologies with real-life applications has achieved state-of-the-art results.

In recent years, computer vision technology has been developing at a rapid pace, with object detection algorithms becoming a research hotspot. Object detection algorithms can be roughly divided into two categories: two-stage object detection algorithms and single-stage object detection algorithms. Two-stage object detection algorithms are renowned for their high accuracy. Representative algorithms include R-CNN [20], Fast R-CNN [21], and Mask R-CNN [22]. However, the detection speed of these algorithms is relatively slow, making them unsuitable for deployment on embedded terminals to perform real-time object detection tasks. In contrast, single-stage object detection algorithms have garnered attention for their fast detection speed. These algorithms use regression-based methods to directly predict object categories and locations in a single step. Well-known single-stage object detection algorithms include SSD (Single Shot MultiBox Detector) [23] algorithm, RetinaNet [24], and YOLO (You Only Look Once) [2527] algorithm. Among them, the YOLO algorithm has evolved over time, and multiple versions have been released.

Multiple convolutional neural network (CNN) architectures have recently been put forward to handle numerous challenges, and they boast improvements in model lightness and inferencing speed on embedded devices. These architectures include efficient convolutional neural networks that implement deep convolution structures, such as SqueezeNet [28], EffcientNet [29], MobileNets [30, 31], and GhostNet [32]. MobileNetV3, a new iteration introduced by Howard,A and his team at Google Brain, is the follow-up to MNASNet [33].

To enable real-time edge computing for fetal echocardiography, current detection models usually require video transmission to GPUs or TPUs with sufficient computational power, leading to high hardware costs. Moreover, severe network latency during data transmission can cause significant frame loss. This can result in the omission of crucial information. Even though integrating embedded devices with adequate computational capabilities into ultrasound equipment can reduce network impact, the cost remains substantial. Therefore, it is crucial to research how to deploy efficient fetal heart detection algorithm models on embedded terminals for edge computing. By integrating embedded devices equipped with efficient algorithms into ultrasound instruments, a direct connection between the ultrasound machine and the embedded device is established. This eliminates network and external influences and ensures smooth operation. As these algorithms are efficient, they significantly lower the hardware requirements for embedded devices, thereby reducing costs.

The main contributions of this research are:

  1. We have provided a dataset named FE-Section Detection (FE-SD). The dataset was collected from case data at the Hebei Maternity Hospital, ensuring that it originates from real clinical settings. All images in the dataset were preprocessed to a consistent size and annotated for training the experimental models.
  2. The backbone of the YOLOv5s model was optimized using MobilenetV3 methods. Model pruning and inference optimization were also performed. Five lightweight models were obtained according to model size. We collectively refer to these five lightweight models as the yolov5s_emn (Extremely Mini Network) series. Training results show that these methods greatly reduced training parameters and model complexity.
  3. The YOLOv5s_emn Series models have undergone inference and recognition accuracy test on the NVIDIA Jetson Nano embedded platform. These tests indicate that the series consistently delivers inference speeds of between 52.8 and 125.0 milliseconds per frame, and it exhibits commendable recognition accuracy.

Materials and methods

Ethics statement

This study was approved by the Ethics Committee of Shijiazhuang fourth medical Hospital, written informed consents were signed by all patients. Shijiazhuang fourth Hospital medical Ethics Committee. Approval number: No.20220070.

Dataset

The dataset utilized for fetal echocardiography image recognition in this research was sourced from the medical records of Hebei Maternity Hospital from January 1, 2017 to June 15, 2022.The dataset is available as S1 and S2 Datasets. This study was approved by the Ethics Committee of Shijiazhuang Fourth Medical Hospital, and written informed consent was obtained from all patients. We started accessing the dataset for research purposes in August 2022.A total of 2,798 fetal echocardiography images were collected. For deep learning training, we divided the ultrasound images into four categories: outflow tract section, abdominal transverse section, four-chamber heart section, and three-vessel section. We used the labeling tool LabelMe [34] to label all the images. Next, we randomly divided the image data of each type into training, validation, and testing subsets following a 7:2:1 distribution (for the ’Three vessels’ category, which had fewer data, we ensured the quantity of the test set by dividing approximately in a 6:2:2 ratio). The distribution of the dataset samples is shown in Table 1. The images in the dataset were preprocessed before training by resizing them to 640x640 pixels.

Algorithm-related YOLOv5s

The findings from the paper "YOLO9000: Better, Faster, Stronger" [35] have shown that One-Stage Detection achieve faster detection speeds compared to Two-Stage Detection, allowing for real-time or near-real-time object detection. Consequently, our research direction is centered around One-Stage Detection models. To validate this, we conducted a series of experiments by training and inferencing several representative models from the YOLO family on our fetal cardiac echocardiography dataset in local machine. The experimental results, depicted in Table 2, YOLOv5s and YOLOv5n stand out from the other models, boasting substantially fewer parameters and swifter inference speeds. Between these two models, YOLOv5s showcases a superior Average Precision (mAP) compared to YOLOv5n. Taking into account the importance of selecting a model that demonstrates well-rounded and balanced attributes, we ultimately opted for YOLOv5s as the cornerstone architecture for our research endeavors.

The overall structure of the YOLOv5s model is shown in Fig 1. This algorithm increases Mosaic data enhancement, adaptive anchor frame calculation, and adaptive image scaling at the input end. The first convolutional layer of backbone network uses a convolution kernel size of 6, the first two convolutional layers of neck network use a kernel size of 1, while the rest of the convolutional layers employ a kernel size of 3. In C3 structure Bottleneck Blocks are employed to enhance the fusion of information from different network levels. It integrates Conv structure and C3 structure detection algorithms at the backbone network end. At the neck network, a Concat layer is often inserted between the Backbone and the final Head output layer, which greatly improves both speed and precision. Therefore, it is suitable for the detection task of fetal echocardiography.

thumbnail
Fig 1. The overall structure of the YOLOv5s model.

The YOLOv5s framework is structured around the input module, the backbone, the neck, and the head. Within its architecture, it incorporates key elements like the C3 module, conventional convolutions (Conv), bottlenecks, SPPF (Spatial Pyramid Pooling—Fast), and prediction layers.

https://doi.org/10.1371/journal.pone.0305250.g001

Algorithm-related MobileNetV3

MobileNetV3 introduces a new non-linear feature, named h-swish, to replace the ReLU6 activation function used in MobileNetV2. The swish function has been instrumental in enhancing model accuracy; however, it comes with a significant computational overhead, as shown in Eq (1). On the other hand, the h-swish function reduces computational requirements, as defined in Eq (2). Moreover, h-swish offers higher accuracy and, unlike sigmoid and other activation functions, doesn’t suffer from saturation problems. Importantly, the computation speed of h-swish is noticeably faster than that of the swish and sigmoid functions.

(1)(2)

MobileNetV3 utilizes the Platform-Aware NAS method. Initially, it establishes the overall network architecture via a block-level search. Subsequently, it fine-tunes the network layers layer-by-layer using Netadapt [36]. These two processes play complementary roles in global and local search scopes respectively. Furthermore, MobileNetV3 has optimized the structures of the input and output stages, reducing the computational burden without compromising accuracy. As depicted in Fig 2, The MobileNetV3 block incorporates the idea of integrating the Squeeze-and-Excite (SE) module and the inverted residual block in parallel, a concept inspired by SENet [37]. The SE module is added on top of the inverted residual block. The SE module compresses the entire channel of input feature maps. The output scale can enhance important features and weaken nonessential ones, making the extracted features more directionally focused.

thumbnail
Fig 2. The overall structure of the MobileNetV3 block.

The MobileNetV3 Block comprises an input layer, an expansion convolution, a Depthwise separable convolution, an SE (Squeeze-and-Excitation) block, a projection layer, and an output layer.

https://doi.org/10.1371/journal.pone.0305250.g002

Given the strengths of MobileNetV3 mentioned above, we decided to employ this model to refine the YOLOv5s architecture. This enhancement is designed to maintain high accuracy while significantly reducing parameter count, aligning with our objectives.

YOLOv5s_emn series

This section introduces the overall framework of the YOLOv5s_emn Series model and the classification of models proposed in this study. The architecture of the YOLOv5s_emn Series model is illustrated in Fig 3. In the Backbone section, we replace the YOLOv5s Backbone with a combination of MobileNetV3 blocks and SPPF layers. First, the image is processed by the MobileNetV3 algorithm, and then a fixed-dimensional adaptive output is produced through the SPPF. SPPF is an optimized version of the SPP [38] method, as developed by the author of YOLOV5. While its purpose and calculation formula are the same, but the structure is slightly modified. Compared to SPP, SPPF reduces the computational load and enhances the model’s speed, the Spatial Pyramid Pooling method connects convolutional layers with outputs of undefined sizes to a fixed-size fully connected layer. On one hand, it can generate a fixed-length output that adapts to different image input sizes, eliminating the need for preprocessing to unify image sizes. On the other hand, it can extract spatial feature data at various scales, thereby enhancing the model’s robustness to different spatial arrangements and object distortions. Originally proposed under specific conditions, the method initially lacked universality. Eq (4) is a modified version that takes into account the input size (w), output size (n), pooling window size (k), stride (s), and padding (p).

(3)(4)
thumbnail
Fig 3. The overall structure of YOLOv5s_emn series model.

The YOLOv5s_emn Series model’s framework consists of four components: the input module, the backbone module, the neck module, and the prediction module. The input module’s duty is to feed the fetal echocardiography image data into the model. The backbone module is designed to extract features from the fetal echocardiography image. The neck module facilitates the integration of the extracted features. Meanwhile, the prediction module is responsible for predicting the four types of fetal echocardiography images.

https://doi.org/10.1371/journal.pone.0305250.g003

In the inference optimization section, we refined the hierarchical structure by replacing the original C3 module in the prediction stage with a Concat module. This change reduces the need to calculate one C3 module for each prediction, which in turn decreases the number of training parameters and computational layers, allowing for a smaller model size. Additionally, we increased the number of predictions bounding boxes from the original three in YOLOV5 to four, which correspond to the Concat layers at model layers 15, 19, 22, and 25. This modification enables the model to predict bounding boxes of four different sizes, enhancing flexibility and detection accuracy, and allowing for the selection of bounding boxes based on the specific scenario.

In the model pruning section, we can apply pruning techniques to both the backbone and neck sections of the model to decrease the number of training parameters further. Specifically. In the backbone section, we can eliminate some of the MobileNetV3 blocks. By selecting only 2–3 out of the four prediction boxes for inference, we can then prune the unnecessary layers in the neck section that are not required for computation. These two strategies contribute to a significant reduction in the number of training parameters, resulting in a more lightweight model.

This study aims to adapt the model for real-world production use, considering the varying computing capabilities of different embedded devices. We have categorized YOLOv5s_emn into five distinct versions—YOLOv5s_emns, YOLOv5s_emnm, YOLOv5s_emnl, YOLOv5s_emnx, and YOLOv5s_emnxx—based on model size, computational resources, and accuracy requirements. The specific layers and prediction boxes for each model version are detailed in Table 3.

Implemented on edge computing platforms

The study utilizes a deep learning project developed with the PyTorch toolkit. to train the YOLOv5, YOLOv5s_ghost, YOLOv5s_Efficient_B0 and five models of YOLOv5s_emn Series. Subsequently, the models are deployed on the Jetson Nano embedded terminal and performs inference testing.

Initially, the models are trained using deep learning techniques. Once training is completed, the optimal weights for each model are chosen based on the results, resulting in the pt-format weights and the performance metrics from the training process. These pt-format weights are then transferred and deployed on the Jetson Nano development board. resulting in the pt-format weights and the performance metrics from the training process running on the embedded device.

Model evaluation metrics

In this research, the model performance assessment metrics include precision, recall rate, mean Average Precision (mAP) with an IoU threshold of 0.5, single frame image inference time, the number of parameters, and model size. Precision is utilized to gauge the accuracy of model detection, as shown in Eq (5). Recall rate is utilized to assess the comprehensiveness of model detection, calculated as in Eq (6), where TP represents true positives, FN false negatives, and FP false positives. mAP signifies the overall performance across different confidence threshold levels, as defined in Eq (7).

(5)(6)(7)

Results and discussion

Training and embedded platform

The model training employed 2 Tesla P4 GPUs with 8GB memory, Windows 10 64-bit OS, Python, and the PyTorch DL framework.

Model inference was carried out on the NVIDIA Jetson Nano development board. The board contains an NVIDIA Maxwell GPU with 128 CUDA cores and a quad-core ARM Cortex-A57 CPU with 4GB LPDDR4 memory. The Jetson Nano runs Ubuntu 18.04 OS and Python 3.6.

Training parameters setting

The model training adheres to consistent parameter settings. It undergoes a training duration of 300 epochs with a batch size of 32. Pre-trained parameters are not utilized in any of the models. At the end of each epoch, the model is preserved and the top-performing model at the conclusion of the training session is chosen as the ultimate model.

Training model results evaluation

In the experiment, as the training epochs progress, the decline in bounding box and the loss values from object detection or instance segmentation clearly signals a trend towards complete model convergence, as illustrated in Fig 4. This typically implies that the model’s performance on the target classification task is quite impressive, or at the very least, it performs well on the training data. These training results will be employed for further performance assessment.

thumbnail
Fig 4. The tends of the bounding box loss and object loss.

(a) bounding object loss, (b)bounding box loss.

https://doi.org/10.1371/journal.pone.0305250.g004

Throughout the experiment, as the training epochs advance, the decreasing rate of the bounding box and the loss values from object detection or instance segmentation clearly demonstrates the model’s convergence, as specifically shown in Fig 5. This generally suggests that the performance of the model on the target classification task is quite remarkable, or at the very least, it performs well on the training data. Furthermore, the model’s precision and recall rates gradually improve with each training iteration. Generally, as the number of epochs increases, the precision of the model improves. After training for about 130 epochs, the values of precision and recall rates gradually stabilize, reaching a basic convergence, and the training progress hits a plateau. These training results will be employed for further performance assessment.

thumbnail
Fig 5. The tends of the he precision and recall rate.

(a) Precision, (b)Recall rate.

https://doi.org/10.1371/journal.pone.0305250.g005

The training results are shown in Table 4. Compared with other models, the five models of YOLOv5_emn series exhibit superior performance in terms of parameter count and model size with no significant difference in precision and recall rate. It’s important to point out that the term ‘number of parameters’ refers to the total number of parameters in the model, specifically in the context of convolutional neural networks. The number of parameters is primarily by the weights in each convolutional layer. The number of parameters impacts memory usage, the rate of model initialization, and the model’s size, all of which can then subsequently influence the model’s performance in making inferences. The reduction of parameter numbers leads to the reduction of network complexity and convolutional network weight numbers, thereby reducing the computational complexity and making it easier to deploy on platforms with lower computing power, which meets our requirements.

Evaluation results and interpretation following model deployment

All models utilized in training were implemented on Jetson Nano for performance evaluation. The results of these tests are displayed in Table 5. The precision, recall rate, and mAp@0.5 of the YOLOv5s-emn Series model show no significant difference compared to other models. However, the inference speed is 96.7–168.9 milliseconds per frame(ms/f), which is significantly improved compared to other models. The YOLOv5s-emnx model exhibits exceptional performance metrics on the Jetson Nano. Based on the experimental results, the deployment of the YOLOv5s-emn series models promises to overcome the challenge of deploying deep learning algorithms in practical scenarios, which has been hindered by the relatively weaker image processing capabilities of edge devices.

Visualization of test results

The test results on Jetson Nano can be visualized. We choose to show the visualization prediction results of YOLOv5s_emns. The prediction results are shown in Fig 6, it clearly illustrates that the detection system is capable of precisely identifying the fetal echocardiography image within the video frame. The YOLOv5s-emn Series model has achieved good detection results under conditions such as speckle noise and artifacts in the image.

thumbnail
Fig 6. The result of four categories images.

(a) Abdominal transverse section, (b)Four-chamber view of the heart, (c)Outflow tract section, (d) Three-vessel section. The label "abdomen" indicates the abdominal transverse section, the label "four_chamber" indicates the four-chamber view of the heart, the label "vot" indicates the outflow tract section, and the label "three_vessels" indicates the three-vessel view. The end of each classification result is attached with the classification confidence score.

https://doi.org/10.1371/journal.pone.0305250.g006

Conclusion

This study aims to solve the problem that the image processing capability of edge devices is relatively low, and it is still difficult to deploy deep learning algorithms in actual application scenarios. We propose a lightweight fetal echocardiography detection series algorithm model called YOLOv5_emn Series. Replace the backbone, model pruning, optimizing inference and minimizing the number of parameters to decrease the overall size of the model. These results show that after optimizing three methods, The enhanced model implemented on the Jetson Nano terminal can significantly speed up the detection of fetal echocardiography fetal echocardiography images without significantly reducing the detection accuracy. This model contains five versions to choose from for deployment on embedded devices, which can meet the balance requirements between model size, computing resources, and accuracy in different scenarios. By implementing the target recognition model on the embedded device, this study achieves real-time detection of fetal echocardiography images and contributes to improving the level of medical intelligence. At present, our algorithm lacks the capability to discern the orientation of the fetal heart. Our subsequent research will prioritize addressing this limitation by integrating traditional visual processing techniques.

Supporting information

References

  1. 1. Kusunose K, Haga A, Abe T, Sata M. Utilization of Artificial Intelligence in Echocardiography. Circulation journal: official journal of the Japanese Circulation Society. 2019;83(8):1623–9. pmid:31257314
  2. 2. Liu H, Yuan G, Li X, Song Y, Wang C, Zhang C. Diagnosis of fetal congenitally unguarded tricuspid valve orifice by echocardiography. Echocardiography (Mount Kisco, NY). 2022. pmid:36184263
  3. 3. Akalın M, Yalçın M, Demirci O, İsmailov H, Sahap Odacilar A, Dizdarogulları GE, et al. Positive effects of fetal echocardiography on maternal anxiety: a prospective study in a tertiary center in Turkey. Journal of psychosomatic obstetrics and gynaecology. 2022:1–8.
  4. 4. Kuehn BM. Cardiac Imaging on the Cusp of an Artificial Intelligence Revolution. Circulation. 2020;141(15):1266–7. pmid:32282247
  5. 5. Garcia-Canadilla P, Sanchez-Martinez S, Crispi F, Bijnens B. Machine Learning in Fetal Cardiology: What to Expect. Fetal diagnosis and therapy. 2020;47(5):363–72. pmid:31910421
  6. 6. Sato M, Tateishi R, Yatomi Y, Koike K. Artificial intelligence in the diagnosis and management of hepatocellular carcinoma. Journal of gastroenterology and hepatology. 2021;36(3):551–60. pmid:33709610
  7. 7. Qiao S, Pang S, Luo G, Pan S, Chen T, Lv Z. FLDS: An Intelligent Feature Learning Detection System for Visualizing Medical Images Supporting Fetal Four-Chamber Views. IEEE journal of biomedical and health informatics. 2022;26(10):4814–25. pmid:34156957
  8. 8. Qiao S, Pan S, Luo G, Pang S, Chen T, Singh AK, et al. A Pseudo-Siamese Feature Fusion Generative Adversarial Network for Synthesizing High-quality Fetal Four-chamber Views. IEEE journal of biomedical and health informatics. 2022;Pp.
  9. 9. Rumelhart DE, Hinton GE, Williams RJ. Learning Representations by Back Propagating Errors. Nature. 1986;323(6088):533–6.
  10. 10. Hinton GE, Osindero S, Teh YW. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation. 2006;18(7):1527–54. pmid:16764513
  11. 11. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. Proc of IEEE Computer Vision & Pattern Recognition. 2009:248–55.
  12. 12. Krizhevsky A, Sutskever I, Hinton G. ImageNet Classification with Deep Convolutional Neural Networks. Advances in neural information processing systems. 2012;25(2).
  13. 13. Szegedy C, Liu W, Jia Y, Sermanet P, Rabinovich A. Going Deeper with Convolutions. IEEE Computer Society. 2014.
  14. 14. Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science. 2014.
  15. 15. He K, Zhang X, Ren S, Sun J, editors. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition; 2016.
  16. 16. Hu J, Shen L, Albanie S, Sun G, Wu E. Squeeze-and-Excitation Networks. IEEE transactions on pattern analysis and machine intelligence. 2020;42(8):2011–23. pmid:31034408
  17. 17. Zheng Q, Tian X, Yu Z, Jiang N, Elhanashi A, Saponara S, et al. Application of wavelet-packet transform driven deep learning method in PM2.5 concentration prediction: A case study of Qingdao, China. Sustainable Cities and Society. 2023;92:104486.
  18. 18. Zheng Q, Tian X, Yu Z, Wang H, Elhanashi A, Saponara S. DL-PR: Generalized automatic modulation classification method based on deep learning with priori regularization. Engineering Applications of Artificial Intelligence. 2023;122:106082.
  19. 19. Zheng Q, Zhao P, Zhang D, Wang H. MR-DCAE: Manifold regularization-based deep convolutional autoencoder for unauthorized broadcasting identification. International Journal of Intelligent Systems. 2021.
  20. 20. Girshick R, Donahue J, Darrell T, Malik J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. IEEE Computer Society. 2014.
  21. 21. Girshick R, editor Fast R-CNN. International Conference on Computer Vision; 2015.
  22. 22. He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2017.
  23. 23. Berg AC, Fu CY, Szegedy C, Anguelov D, Erhan D, Reed S, et al. SSD: Single Shot MultiBox Detector. 2015.
  24. 24. Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2017;PP(99):2999–3007.
  25. 25. Redmon SD J., and Girshick R.,. “You only look once:unifed, real-time object detection,”. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pp 779–788, Las Vegas, NV, USA, June 2016.
  26. 26. Bochkovskiy CYW A., and Liao H.,. “YOLOv4:Optimal speed and accuracy of object detection,”.in Proceedings of the IEEE Conference on Computer Vision and Pattern Recog nition(CVPR), Seattle, WA, USA, June 2020.
  27. 27. Redmon J. and Farhadi A. “YOLOv3: an incremental im provement,”. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 89–95, Salt Lake City, UT, USA, June 2018.
  28. 28. Landola F. N HS, Moskewicz M. W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv preprint arXiv: 1602.07360, 2016.
  29. 29. Tan M, Le Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: Kamalika C, Ruslan S, editors. Proceedings of the 36th International Conference on Machine Learning; Proceedings of Machine Learning Research: PMLR; 2019. p. 6105–14.
  30. 30. Howard A.G. ZM, Chen B., Kalenichenko D., Wang W., Weyand T., Andreetto M., et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications.arXiv. 20171704.04861.
  31. 31. Howard A. SM, Chu G., Chen L.C., Chen B., Tan M., Wang Wet al. Searching for mobilenetv3;.Proceedings of the IEEE International Conference on Computer Vision; Seoul, Korea. 2 September–27 October 2019; pp. 1314–1324.
  32. 32. Kai Han YW, Tian Qi, Jianyuan Guo, Chunjing Xu, Chang Xu. GhostNet: More Features from Cheap Operations. 2019:arXiv:1911.11907.
  33. 33. Tan M, et al. "Mnasnet: Platform-aware neural architecture search for mobile.". Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
  34. 34. Russell BC, Torralba A, Murphy KP, Freeman WT. LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision. 2008(1/3):77.
  35. 35. Redmon J, Farhadi A, editors. YOLO9000: Better, Faster, Stronger. IEEE Conference on Computer Vision & Pattern Recognition; 2017.
  36. 36. Yang T-J, et al. "Netadapt: Platform-aware neural network adaptation for mobile applications." Proceedings of the European Conference on Computer Vision (ECCV) 2018.
  37. 37. Hu J SL, Sun G. Squeeze-and-excitation networks[J].arXiv preprint arXiv:1709.01507,2017, 7.
  38. 38. Kaiming He XZ, Shaoqing Ren, Jian Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. 2014;arXiv:1406.4729