ESC-YOLOv8: An enhanced deep learning framework for semantic understanding of single-line diagram imagery

Hina Bhanbhro; Yew Kwang Hooi; Worapan Kusakunniran; M Nordin B. Zakaria; Syed Abdul Moiz Hashmi; Zaira Hassan Amur; Vengas Memon

doi:10.1371/journal.pone.0340719

Abstract

Accurate interpretation of single-line diagrams (SLDs) is crucial for analyzing electrical systems, as they encapsulate vital information about operational safety and efficiency in a simplified format. Traditional SLD processing methods rely on manual inspection and basic image analysis, which are computationally intensive, error-prone, and require extensive preprocessing. Although deep learning has been applied to symbol classification, existing models often fail to capture fine-grained symbol details, leading to misclassification. To address these limitations, this study proposes a hybrid deep learning-based symbol classification method. A newly created dataset was benchmarked using state-of-the-art deep learning models, and an optimal model was systematically designed, developed, and tested. The proposed approach integrates a Hybrid Residual Attention Module (HRAM) to enhance the model’s ability to identify fine-grained symbol details and a Proximity-aware Loss Function to improve performance in cluttered regions by motivation of this work stems penalizing misclassifications based on the spatial proximity of neighboring symbols. These modifications result in an optimized method for semantic processing in symbol classification tasks. The proposed model achieves 93.5% mean average precision (mAP) a 3.8% improvement over the top-performing baseline, alongside a 19.6% reduction in model parameters. These advancements contribute to more efficient and accurate semantic processing of SLDs, paving the way for improved analysis of electrical system diagrams.

Citation: Bhanbhro H, Kwang Hooi Y, Kusakunniran W, Zakaria MNB, Hashmi SAM, Amur ZH, et al. (2026) ESC-YOLOv8: An enhanced deep learning framework for semantic understanding of single-line diagram imagery. PLoS One 21(3): e0340719. https://doi.org/10.1371/journal.pone.0340719

Editor: Rajesh Kumar, National Institute of Technology, India (Institute of National Importance), INDIA

Received: March 7, 2025; Accepted: December 25, 2025; Published: March 11, 2026

Copyright: © 2026 Bhanbhro et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in this study are subject to confidentiality agreements and belong to PETRONAS Group Technical Solutions. Due to legal and contractual restrictions, the data cannot be shared publicly. Data access requests may be considered on a case-by-case basis and can be directed to the Universiti Teknologi PETRONAS Research Management Centre (email: info@utp.edu.my) with prior written permission from PETRONAS.

Funding: Not available at the time of submission.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: ASM, Active Shape Models; ASME, American Society of Mechanical Engineers; CNN, Convolutional Neural Network; DETR, Detection Transformer; HOG, Histogram of Oriented Gradients; HRAM, Hybrid Residual Attention Module; ICSET, International Conference on System Engineering and Technology; IOGP, International Association of Oil & Gas Producers; NMS, Non-Maximum Suppression; P&ID, Piping and Instrumentation Diagrams; PaL, Proximity-aware Loss; RGB, Red, green, and blue; RPN, Region Proposal Network; SGD, Stochastic Gradient Descent; SIFT, Scale-Invariant Feature Transform; SLD, Single-line diagram; SSD, Single Shot MultiBox Detector; YOLO, You Only Look Once

Introduction

The pursuit of automated scene interpretation has witnessed remarkable progress, propelled by advancements in machine learning methodologies [1]. However, the ability of machines to furnish comprehensive semantic descriptions of natural scenes derived from digital images remains conspicuously constrained, falling significantly short of human capabilities [2]. This disparity, often referred to as the “semantic gap” underscores the challenges inherent in endowing machines with the capacity to discern and interpret the intricate relationships between objects and their contextual surroundings [3]. Consequently, there is a growing need to leverage high-level context obtained from object detectors and scene classifiers to bridge this gap [4]. The recent progress in deep learning has introduced sophisticated instruments capable of acquiring semantic, high-level, and deeper features, offering avenues to tackle the limitations inherent in conventional architectures [1]. These tools are essential for enhancing the interpretation of complex visual data, especially in specialized domains like electrical engineering.

In this domain, SLDs serve as a fundamental visual language for representing electrical power systems, wherein the semantic processing of these diagrams is pivotal for tasks such as power system analysis, fault diagnosis, and automated design. The motivation of this work stems from the pressing need to develop robust, efficient, and generalizable methods for automated SLD interpretation, especially in industrial environments where manual processing is time-consuming and error-prone.

There is a growing demand for digital systems capable of processing and analyzing SLDs, driven by the need for efficiency, accuracy, and integration into modern workflows [3]. Digitization allows industries to transition from paper-based formats, which are prone to degradation and loss, to digital formats that can be easily edited, stored, and shared across teams using advanced software [1]; however, many organizations, especially those managing legacy projects, continue to rely on outdated paper-based or scanned drawings, which lack the interactive features needed for modern data extraction and integration [5,6], and a survey by the American Society of Mechanical Engineers (ASME) found that nearly 60% of engineering firms still maintain critical drawings as paper or non-editable digital formats, highlighting the urgent need for digitization [2].

Digitizing these drawings not only simplifies information extraction but also enhances the ability to update designs as components are replaced or modified due to maintenance over the lifecycle of a plant (Transforming Legacy Drawings into Digital Assets). This digital transformation enables domain-specofproject teams to maintain up-to-date inventories, streamline project management, and ensure compliance with evolving safety and regulatory standards [7]. For example, digitized SLDs are particularly valuable in power distribution and industrial settings where real-time access to updated schematics can significantly reduce downtime during troubleshooting and repairs [8,9]. Moreover, the International Association of Oil & Gas Producers (IOGP) has reported that digitized maintenance and design records can help to reduce operational inefficiencies by up to 25%, underscoring the financial and safety benefits of digital engineering drawing [10].

Against this backdrop, the objectives of this paper are to (i) design and develop an enhanced YOLOv8-based model for symbol detection and classification in SLDs, (ii) integrate novel mechanisms to reduce misclassifications and improve generalization, and (iii) validate the model across diverse datasets to demonstrate robustness and efficiency.

Recent advancements in deep learning have created new opportunities to address these challenges [10]. These models are particularly suited for symbol classification in SLDs due to their ability to learn complex patterns (as illustrated in Fig 1) and features that conventional methods struggle to capture. However, their application in SLDs remains underexplored due to the complex nature of SLD images and the need for extensive, annotated datasets of engineering symbols [11]. The visually similar and symmetrical nature of many symbols complicates differentiation, which can result in misclassification, adversely impacting system diagnostics and project timelines. Furthermore, the scarcity of publicly available, well-annotated datasets poses a significant barrier, limiting the development and testing of deep learning models tailored for SLDs [12]. This leads to the interesting possibility of classifying symbols using a deep learning image classification model and a symbol image dataset only.

Download:

Fig 1. Understanding Semantics of SLDs.

https://doi.org/10.1371/journal.pone.0340719.g001

To address these challenges, this study makes the following key contributions:

A novel SLD image dataset designed for symbol classification tasks.
An advanced deep learning model that integrates the HRAM to enhance feature extraction and capture fine-grained symbol details.
A Proximity-aware Loss Function (PaL) customized to improve semantic processing and classification accuracy in dense and cluttered regions of SLDs.

Building on this approach, this study proposes a novel deep learning-based symbol classification model that leverages SLD images exclusively for classifying prevalent symbols. The proposed solution begins with annotated and reviewed SLD images, which are used to train the deep learning model. By automating symbol classification, this approach not only addresses the limitations of traditional methods but also reduces the reliance on specialized human expertise for operational and maintenance tasks. This innovation streamlines the process, enhancing efficiency and accuracy in SLD interpretation.

The remainder of this manuscript is organized as follows. Section 2 presents a detailed review of related work, focusing on existing methods and recent advances in symbol classification for SLDs. Section 3 describes the proposed methodology, including dataset development, model benchmarking, and the design of enhanced deep learning architecture. Section 4 reports the experimental results and performance evaluation of the proposed approach. Finally, Section 5 concludes the study, summarizing key findings and outlining directions for future research.

Related works

Existing methodologies for interpreting SLDs frequently encounter challenges in accurately extracting and interpreting complex symbols and relationships, often relying on rule-based systems or traditional image processing techniques that lack the adaptability to handle variations in diagram styles and complexities.

Traditional methods for symbol classification.

Traditional methods for symbol classification in engineering drawings, such as Template Matching, Rule-Based Methods, Feature-Based Classification, and Statistical Shape Modeling, have been widely used due to their simplicity and interpretability [13]. However, they struggle with symbol variability, overlapping elements, and complex layouts, particularly in SLDs [14]. Template Matching, implemented in OpenCV, relies on similarity measures but is highly sensitive to resolution, occlusion, and environmental complexity, leading to high false positives and negatives [14]. Heuristic and Rule-Based Methods, like those developed by Lee et al., use predefined geometric rules for classification but lack adaptability to new symbols and require extensive manual updates. Feature-Based Classification, using techniques like Histogram of Oriented Gradients (HOG) and Scale-Invariant Feature Transform (SIFT) with SVM or KNN classifiers, performs well on clean images but suffers in noisy or cluttered environments where symbols overlap [15].

Statistical Shape Modeling, particularly Active Shape Models (ASM), captures shape variations and is effective for deformable symbols, as demonstrated [16] in circuit diagrams. However, ASM is computationally expensive, requires extensive preprocessing, and struggles with closely packed symbols, as noted by [17] in CAD environments.

Despite their contributions, these traditional methods are inherently limited in handling the complexities of real-world engineering drawings, highlighting the need for more robust and adaptive deep learning approaches for symbol classification [18].

Deep learning-based and transformer-based symbol classification.

The advancement of deep learning has greatly enhanced the performance of symbol recognition in technical drawings. Convolutional Neural Networks (CNNs) have shown remarkable accuracy in classifying electrical symbols and identifying their positions within SLDs [19]. These models are particularly useful for symbol detection, feature extraction, and structural interpretation due to their ability to learn hierarchical patterns from data. Two-stage and one-stage CNN models have been widely utilized, each with their unique strengths and limitations [20].

Two-stage object detectors, such as Faster R-CNN, Mask R-CNN, and Cascade R-CNN, employ a Region Proposal Network (RPN) to identify object regions before refining them in a classification stage, improving the detection of small or overlapping symbols in engineering drawings, as seen in Fig 2 [21]. These models leverage convolutional backbones and anchor-based mechanisms to handle varying symbol scales and orientations but often suffer from slower inference and higher computational demands [21]. Zhang et al. achieved 89% mAP in electrical schematics using CNNs but noted challenges with overlapping symbols and poor visual quality [22]. Kim et al. applied You Only Look Once (YOLO) for real-time symbol detection in SLDs, achieving a 90% detection rate but struggling with densely packed elements [23]. Liu et al. focused on symbol classification in P&IDs, reporting 87% mAP but highlighting difficulties with occluded symbols [24,25]. In their study, [26] proposed a hybrid CNN-LSTM approach to capture sequential symbol relationships in SLDs, reaching 84% mAP on synthetic datasets but facing generalizability issues in real-world applications. These studies highlight the strengths and limitations of two-stage models, emphasizing the need for adaptable solutions to handle real-world engineering drawings effectively.

Download:

Fig 2. Architecture of a two-stage model for object detection.

https://doi.org/10.1371/journal.pone.0340719.g002

One-stage models, such as YOLO, SSD (Single Shot MultiBox Detector), and EfficientNet, have transformed symbol classification tasks with their exceptional efficiency and real-time detection capabilities [27], as illustrated in Fig 3. Unlike two-stage detectors, which separate object proposal generation and classification, one-stage models integrate both processes into a single step, enabling rapid inference essential for applications like automated engineering drawing analysis [28]. YOLO’s grid-based approach simultaneously predicts bounding boxes and class probabilities, making it highly effective for processing dense, complex images [29]. However, these models often face challenges in balancing speed and accuracy, particularly with small, densely packed symbols in cluttered environments, where they may miss fine details or generate false positives [30]. Redmon et al. [28] pioneered YOLO, revolutionizing object detection by unifying classification and localization, establishing its prominence in engineering drawing analysis.

Download:

Fig 3. A one-stage model architecture for objection detection.

https://doi.org/10.1371/journal.pone.0340719.g003

Recent studies have explored YOLO-based architectures for semantic understanding in diverse domains beyond engineering diagrams. For example, Qureshi et al. [30] proposed a hybrid approach combining semantic segmentation and YOLO detection for aerial vehicle imagery, achieving robust performance in complex environments with occlusions and varying scales. Their work underscores the adaptability of YOLO frameworks for tasks requiring precise object localization and classification in cluttered scenes, which aligns with the challenges addressed in SLD interpretation. Integrating attention mechanisms and custom loss functions, as in our proposed ESC-YOLOv8, further extends these principles to industrial diagram analysis.

In addition to aerial and industrial applications, YOLO-based models have been adapted for highly complex environments such as underwater scenes. Wang et al. [31] introduced YOLO-DBS, an improved YOLOv8 architecture optimized for detecting targets in challenging underwater imagery characterized by low visibility and clutter. Their approach leverages architectural refinements to enhance detection accuracy and efficiency under adverse conditions. This work demonstrates the versatility of YOLOv8 and reinforces the need for domain-specific enhancements, similar to our integration of HRAM and Proximity-aware Loss for symbol classification in densely packed SLDs.

Despite advancements in deep learning for symbol classification, challenges remain, such as the need for extensive datasets, addressing class imbalances, and enhancing model robustness against noise and varying conditions [29]. Table 1 presents that current research highlights the necessity for more adaptable deep learning models specifically designed to handle the complexities of engineering drawings, ensuring both accuracy and efficiency in symbol classification across diverse applications.

Download:

Table 1. Summary of deep learning models for symbol classification.

https://doi.org/10.1371/journal.pone.0340719.t001

To further strengthen the adaptability and scalability of deep learning-based approaches, it is valuable to draw insights from adjacent research fields that tackle similar challenges in large-scale and complex systems. For instance, in the domain of edge computing, the study [30] proposed a latency and privacy aware resource allocation framework for vehicular edge computing. Their work demonstrates how distributed and edge-based architectures can improve system responsiveness and data security, offering highly relevant strategies when deploying real-time deep learning models for industrial SLD analysis. Similarly, [32,33] applied large language models to method level bug severity prediction using software metrics, showing how integrating domain-specific features with advanced deep learning architectures can enhance classification accuracy and robustness.

In addition, the management of large and heterogeneous datasets, such as varied SLDs from multiple industrial sources, benefits from dynamic resource provisioning and approximation strategies. [34,35] introduced Data Variety Aware Resource Provisioning Architecture (DV ARPA), a framework designed for big data resource provisioning, aligning with the need to handle diverse symbol representations efficiently. Complementing this, a study presented Gallup Approximation (Gapprox), which applies approximation techniques to big data processing, balancing computational cost with result accuracy [36,37,38]. These approaches provide valuable guidance for optimizing deep learning pipelines used in SLD symbol classification, particularly when aiming for industrial-scale deployment where both speed and precision are critical. Together, these related works provide a strong foundation for enhancing the scalability, efficiency, and reliability of deep learning-based symbol classification systems used in industrial power system analysis.

To address the limitations of traditional one-stage and two-stage models, recent research has introduced transformer-based architectures that significantly improve symbol recognition in structured diagrams. For example, SwinIR (Swin Transformer for Image Restoration), built on the Swin Transformer, excels in image restoration tasks by modeling long-range dependencies and leveraging hierarchical representations, making it highly effective for capturing fine details in dense engineering diagrams [39,40]. Building on this, Swin2SR (Swin Transformer Version 2 for Super-Resolution) extends the Swin Transformer Version 2 to enhance training stability and performance, especially under compressed image conditions [39,40]. In the detection domain, transformer-based frameworks like DETR (Detection Transformer) provide an end-to-end approach that eliminates the need for components such as non-maximum suppression by directly predicting object sets, offering improved precision for complex object detection tasks [41,42]. These innovations, along with hybrid models combining CNN backbones with transformers, present promising directions for advancing symbol classification accuracy, scalability, and robustness in technical drawings.

Beyond CNN/transformer architectures, graph-based learning provides a principled way to exploit the relational structure inherent in SLDs. Liu et al. [43] present a formal model for multi agent Q learning on graphs, in which agents coordinate decisions using graph topology to optimize task performance. While their work targets generic graph environments rather than engineering drawings, the formalism suggests a natural extension for SLD analysis: symbols and conductors can be modeled as nodes and edges, enabling agents to learn context aware decisions about symbol classification and connection interpretation. Such graph centric reinforcement learning could complement our ESC YOLOv8 by providing post detection relational reasoning (e.g., resolving ambiguities in dense regions through topology aware policies).

Despite advancements, deep learning models like YOLO and SSD struggle with symbol classification in cluttered or occluded engineering drawings, as traditional loss functions lack adequate guidance [44,45]. Challenges like overlapping lines, varying scales, and inconsistent annotations complicate loss function design. Future research should focus on adaptive loss functions to enhance training efficiency and accuracy across diverse datasets [46–51].

Methods

The methodology (as presented in Fig 4) comprises three layers: dataset development, preprocessing, and class imbalance handling. Finally, we benchmark baseline models and introduce our proposed network, evaluating its performance through systematic ablation studies. This comprehensive approach is designed to enhance symbol recognition accuracy in complex engineering drawings [44].

Download:

Fig 4. Proposed methodology for symbol classification.

https://doi.org/10.1371/journal.pone.0340719.g004

Novel dataset development

For the experiments in this research work, we chose to work with SLDs Fig 5. This study aims to develop a comprehensive and scalable dataset following established guidelines, specifically tailored to optimize deep learning model training for improved classification accuracy of SLD symbols. By doing so, the performance of deep learning models in recognizing and accurately classifying electrical symbols can be greatly enhanced. The acquired data of 6,700 images comprises scanned images of the drawings with representation of widely used symbols. Additionally, the SLDs are of different qualities, which makes the dataset suitable for evaluation purposes.

Download:

Fig 5. A schematic representation of single-line diagram.

https://doi.org/10.1371/journal.pone.0340719.g005

Data exploring & preprocessing guidelines.

SLD images are cluttered with text and symbols, often lacking distinctive features and containing noise from scanning. The original SLD sheets are large images, 7500 × 5250 pixels. To speed up the training process we divided the sheet into 6 × 4 grid, resulting in 24 sub-images (patches) with relatively much smaller sizes compared to the original sheets (1250 × 1300).

Preprocessing techniques, including (i) gray processing and (ii) text removal, are applied to enhance model performance. Gray processing converts red, green, and blue (RGB) images to grayscale using the weighted average method (Eq 1) [39], while text removal employs Easy Optical Character Recognition (EasyOCR) with thresholds (e.g., OCR confidence > 0.7 for text removal) and in-painting to eliminate non-essential elements, as seen in Fig 6.

Download:

Fig 6. Text removal process.

https://doi.org/10.1371/journal.pone.0340719.g006

(1)

This equation represents the grayscale conversion formula, where R, G, and B are the red, green, and blue color channel intensities, respectively, and F is the final grayscale intensity. The weighted coefficients reflect human visual sensitivity to each color channel, giving more weight to green and less to blue. Applying this transformation simplifies the image data by reducing it to a single intensity channel, which enhances computational efficiency and reduces complexity during subsequent preprocessing and model training steps.

Class distribution.

Training a Deep Learning model requires fully annotated images. To do so, we have used RoboFlow to annotate the collection of SLD diagrams. The data that resulted from the annotation representing nine unique classes was gathered using a two-step annotation process: (1) drawing bounding boxes around symbols with unique colors, and (2) assigning class labels, excluding mismatched images. The distribution of classes used in this dataset is detailed in Table 2.

Download:

Table 2. Symbol classes and representations for SLD annotation.

https://doi.org/10.1371/journal.pone.0340719.t002

The annotated dataset captures information for nine distinct symbol classes, stored in a file format that includes the x and y coordinates of each symbol’s bounding box center, along with its width and height. A total of 17,085 symbols were labeled across these classes. However, the dataset exhibits significant class imbalance, as illustrated in Table 2. Potential biases arising from class imbalance are explicitly addressed through a targeted augmentation strategy, ensuring diversity and representativeness of the dataset. To address this issue, a carefully designed augmentation pipeline was implemented, specifically targeting the underrepresented SLD classes to enhance the presence of minority symbols.

Data augmentation enhances dataset quantity and quality by introducing variability and diversity, crucial for training robust deep learning models. Techniques like geometric flips, brightness, random erasing, and image contrast are responsible for balancing the minor classes to ensure unbiased performance, as outlined in Table 3.

Download:

Table 3. Identification of label preserving data augmentation methods.

https://doi.org/10.1371/journal.pone.0340719.t003

Download:

Fig 7. The Dataset Augmentation Pipeline (DAP).

https://doi.org/10.1371/journal.pone.0340719.g007

Model benchmarking methodology

Benchmarking involves evaluating the performance of various deep learning models on a dataset to ensure unbiased assessment and identify areas for improvement. This section outlines the experimental design, hardware/software environments, and performance metrics used to assess the proposed dataset’s effectiveness.

Establishing baseline accuracy for the new dataset involves evaluating state-of-the-art deep learning models, including YOLO versions (v7 to v10) and YOLO-World. The experiments incorporate both one-stage and two-stage detection models to capture a broader perspective on performance, balancing speed and accuracy, as given below in Table 4.

Download:

Table 4. Selected models for benchmarking experiments.

https://doi.org/10.1371/journal.pone.0340719.t004

Hyperparameter configurations.

Each of the listed deep learning models is trained, validated, and tested on the proposed dataset. A standard model training parameter set [44] is defined and applied in all experiments. The details of which are presented in Table 4.

Table 5 outlines the training parameters: Stochastic Gradient Descent (SGD) was used as the optimizer for its stability in convergence; an initial learning rate of 0.001 was selected to ensure gradual updates; training ran for 100 epochs with a batch size of 16 to balance learning efficiency and memory constraints. Graphics Processing Unit (GPU) execution accelerated training. An Intersection over Union (IoU) threshold of 0.7 was chosen to enforce stricter localization accuracy, while the maximum number of detections was capped at 300 to limit redundancy. Non-Maximum Suppression (NMS) was disabled to retain overlapping detections, and validation was performed every 50 iterations for timely performance monitoring.

Download:

Table 5. Standard model training parameters.

https://doi.org/10.1371/journal.pone.0340719.t005

Hardware & software setup.

A standardized hardware and software environment ensures consistency and reproducibility during benchmarking. The setup includes an Intel Core i9 13900HX CPU, 32GB RAM, NVIDIA GeForce RTX 4090 GPU, Windows 11 Pro, and Python 3.10. This configuration was selected to support efficient training and evaluation of deep learning models with high computational demands.

Proposed deep learning model for symbol classification

This section details the selection and refinement of the reference model based on benchmarking results. The top-performing model is analyzed, fine-tuned, and enhanced with architectural improvements to maximize classification accuracy.

Based on benchmarking results, models are evaluated using F1, recall, and mAP to identify the most effective architecture. Models, including YOLOv8, YOLOv10, and YOLO-World etc. are compared, and the model with the best balance of these metrics is selected as the reference for further optimization.

Benchmarking results identified YOLOv8 as the best-performing model based on F1, recall, and mAP. The proposed model builds on YOLOv8 and named ‘Enhanced Symbol Classification YOLOv8 (ESC-YOLOv8)’, enhancing attention mechanisms and loss functions for improved feature extraction and classification, Fig 8 illustrates model architecture. Each modification is introduced incrementally, refining the architecture for optimal performance. The proposed changes are as follows:

Download:

Fig 8. Proposed ESC-YOLOv8 model architecture.

https://doi.org/10.1371/journal.pone.0340719.g008

Model-1: Hybrid residual attention module.

The HRAM integrates channel, spatial, and input features in parallel, unlike sequential methods like Convolutional Block Attention Module (CBAM), enhancing feature extraction efficiency. HRAM (as seen in Fig 9) reduces computational overhead, preserves fine-grained details, and accelerates inference by minimizing layers, making it highly effective for symbol classification in SLDs. Our design draws on the principle of leveraging attention for fine-grained feature extraction, similar to approaches in other domains such as Zhao et al. [52] [Zhao, H., Ji, T., Rosin, P. L., Lai, Y., Meng, W.,... Wang, Y. (2024). Cross-lingual font style transfer with full-domain convolutional attention. Pattern Recognition, 155, 110709. https://doi.org/10.1016/j.patcog.2024.110709], who applied full-domain convolutional attention for cross-lingual font style transfer.

Download:

Fig 9. HRAM for Model-1.

https://doi.org/10.1371/journal.pone.0340719.g009

In HRAM, channel attention is computed by applying global average and max pooling across the spatial dimensions, followed by fully connected layers to generate an attention map, represented by Eq 2:

(2)

In this equation, Mc represents the channel attention map, which is computed by combining two types of pooled information: the average-pooled feature vector (vavg) and the max-pooled feature vector (vmax). The learnable weights W1 and W2 adjust the contribution of each pooled feature, while the sigmoid activation function sigma (σ) normalizes the combined result to produce an attention map in the range [0,1]. This map highlights the most important feature channels, allowing the model to emphasize critical channel-level information and suppress less relevant channels during feature extraction. Spatial attention is computed by pooling across the channel dimension and applying a convolutional layer, as shown in Eq 3:

(3)

In this equation, Ms denotes the spatial attention map, which is generated by first concatenating the average-pooled feature map Favg and the max-pooled feature map Fmax along the channel dimension. This combined feature map is then passed through a convolutional layer (Conv) to capture spatial relationships and local interactions across the feature map. Finally, the sigmoid activation function σ normalizes the output to a range between 0 and 1, producing an attention map that highlights important spatial regions in the feature map, allowing the network to focus on critical spatial patterns during classification. The final feature map is updated by applying the attention maps multiplicatively in Eq 4 [53]:

(4)

In this equation, Fout represents the final refined feature map obtained after applying both channel and spatial attention mechanisms. The original input feature map F is element-wise multiplied by the channel attention map Mc and the spatial attention map Ms, effectively reweighing the feature map to emphasize both important channels and critical spatial regions. This combined attention refinement enhances the network’s ability to focus on the most informative features, improving the accuracy and robustness of the symbol classification task, especially in dense and cluttered diagrams.

Model 2- proximity-aware loss function (PaL).

In dense object detection, standard loss functions such as YOLOv8’s Varifocal Loss (VFL) often fail to distinguish overlapping or closely positioned objects, resulting in merged or ambiguous predictions. To address this limitation, we propose the PaL, Fig 10, which augments VFL with a spatial penalty that discourages predictions with insufficient separation.

Download:

Fig 10. Optimized Proximity-aware Loss Function.

https://doi.org/10.1371/journal.pone.0340719.g010

Proximity Penalty Term: The Varifocal Loss (VFL) balances confidence scores with ground truth labels, focusing on difficult cases. A proximity penalty term penalizes predictions where bounding boxes are too close, enforcing spatial separation based on a threshold. This ensures distinct object detection, as defined in Eq 5:

(5)

here, B_i and B_j represent the bounding boxes of objects i and j and λ is a scaling factor that adjusts the strength of the penalty. While ∊ is a small constant to avoid division by zero, I ((Bi, Bj) < d_threshold) is an indicator function that activates when the distance between Bi and Bj is less than the threshold value. PaL hyperparameters (λ = 1.2, distance threshold = 12 pixels). This penalty grows as the distance between the bounding boxes decreases, ensuring that the model penalizes overly close bounding boxes but still maintains separate detections for both objects.

Proximity-Aware Loss: The Proximity-aware Loss Function integrates Varifocal Loss and a proximity penalty to improve classification in dense object scenarios. The complete loss function is given in Eq 6:

(6)

In this equation, L_{proximity-aware} represents the final loss function used to train the ESC-YOLOv8 model. It combines the standard Varifocal Loss (L_VFL), which focuses on balancing classification confidence and localization accuracy, with the proximity penalty term (L_proximity), which enforces spatial separation between closely positioned bounding boxes. By integrating these two components, the model is encouraged to not only improve its classification predictions but also maintain distinct detections in densely packed regions. This combined loss formulation directly addresses the challenges of overlapping or clustered symbols commonly found in SLDs, enhancing both detection robustness and fine-grained localization.

Additionally, the use of a class-weighted formulation of the Varifocal Loss helps mitigate class imbalance by assigning higher weights to underrepresented symbol classes based on their inverse frequency. This ensures that rare classes, such as Ammeter and Generator, contribute more significantly to the loss during training, leading to improved representation and classification performance across all symbol types.

This combined loss function encourages the network to correctly classify objects while penalizing predictions that position bounding boxes are too close to one another.

Proposed algorithm: ESC-YOLOv8 workflow.

To provide a clear overview of the proposed ESC-YOLOv8 model and guide the implementation process, we outline its full workflow in the form of a step-by-step algorithm in algorithm 1. This algorithm details each phase, from data preparation to model design, training, evaluation, and result analysis, ensuring reproducibility and clarity for both researchers and practitioners working on symbol classification in SLDs.

The presented algorithm offers a structured breakdown of the ESC-YOLOv8 workflow, highlighting how each stage contributes to the overall system performance. By systematically integrating advanced components such as the HRAM and the Proximity-aware Loss Function (PaL), the algorithm ensures that the model is optimized not only for accuracy but also for efficiency and scalability. This formalized representation also facilitates easier adaptation and extension in future work, enabling researchers to build upon the described approach for related tasks in industrial diagram analysis.

Download:

Algorithm 1. ESC-YOLOv8 Symbol Classification Model.

https://doi.org/10.1371/journal.pone.0340719.t011

Performance assessment of the model

Model performance is evaluated on a test set using F1, recall, mAP, and the confusion matrix, ensuring a thorough assessment of classification accuracy and error analysis.

Precision and recall are the two most commonly used metrics for evaluating a model [54] and their definitions are provided below. In multi-class classification, precision measures the proportion of true positives among all positive predictions, assessing the model’s accuracy in class assignment. It is defined in Eq 7 [39]:

(7)

In multi-class classification, recall measures the model’s ability to identify all instances of a class [52]. It is calculated as the ratio of true positives to the sum of true positives and false negatives, as shown in Eq 8 [39]:

(8)

F1-score is another widely used metric in multi-class classification, especially when evaluating performance on imbalanced datasets. It represents the harmonic mean of precision and recall, providing a balanced measure that accounts for both false positives and false negatives. It is defined in Eq 9 [39]:

(9)

The confusion matrix summarizes a classification model’s performance. It compares predicted labels against actual labels, with TP, FN, FP, and TN representing True Positive, False Negative, False Positive, and True Negative, respectively.

Results

This section of the study focuses on various results produced during the development of the dataset and benchmarking experiments. Additionally, the model enhancement results are also presented and discussed.

Model benchmarking results

The dataset’s suitability was evaluated using five deep learning models, benchmarked under standard conditions, with results detailed in the following text.

Table 6 presents benchmark results for SLDs symbol classification, with YOLOv8 achieving the highest mAP (89.7%), outperforming YOLOv10 (88.3%), YOLOv9 (86.5%), and YOLOv7 (80.6%). YOLO-World (82.8%), struggling with dense and overlapping symbols. YOLOv8 excelled in classifying switches (91%) and motors (97.8%) but showed lower mAP for complex symbols like delta (86.7%) and ammeters (86.0%). These results confirm YOLOv8’s superiority in symbol detection across diverse engineering drawings.

Download:

Table 6. Single-line diagram image dataset benchmark results.

https://doi.org/10.1371/journal.pone.0340719.t006

Proposed model results

This section presents the evaluation of the proposed model and its architectural enhancements through a series of experiments. The improvements focus on enhancing feature extraction and symbol localization in dense and complex SLDs. To validate the effectiveness of each modification, a step-by-step ablation study was performed, measuring the impact of attention mechanisms, custom loss functions, and class rebalancing techniques on classification performance.

Ablation study.

To better understand the individual impact of each proposed enhancement, an expanded ablation study was conducted. Table 7 presents the baseline YOLOv8 model and its progressive modifications.

Download:

Table 7. Ablation study results of model-1, model-2 and ESC-YOLOv8.

https://doi.org/10.1371/journal.pone.0340719.t007

Model-1 enhances the baseline YOLOv8 by introducing improved attention to key symbol features, but it still struggles to separate overlapping symbols such as diodes and deltas. To overcome this limitation, Model-2 integrates a Proximity-aware Loss Function into the YOLOv8 architecture. This addition improves bounding box separation and reduces errors caused by spatial proximity, particularly in densely populated regions. ESC-YOLOv8 is then developed by combining the enhancements of both Model-1 and Model-2, resulting in a more robust and accurate symbol classification system. As shown in Fig 11, the confusion matrix comparison reveals significant improvements in prediction accuracy for ESC-YOLOv8 over the intermediate models.

Download:

Fig 11. Confusion matrix of model-1 (a), model-2 (b) and ESC-YOLOv8.

https://doi.org/10.1371/journal.pone.0340719.g011

The proposed ESC-YOLOv8 model integrates attention and a custom loss function, enhancing symbol classification by improving focus, precision, and adaptability, as shown in Fig 12.

Download:

Fig 12. Multi-dimensional improvements achieved by the proposed model, (a) improved overall mAP (b) reduced model parameters.

https://doi.org/10.1371/journal.pone.0340719.g012

As presented in Table 8, the proposed model outperforms existing approaches due to its novel integration of attention mechanisms and a Proximity-aware Loss Function. The findings indicate significant improvements in symbol detection and classification, particularly in densely packed regions. This is evident in the comparative performance metrics, where the proposed model consistently achieves higher accuracy and precision than other models.

Download:

Table 8. Symbol Classifications Model Performance Comparison.

https://doi.org/10.1371/journal.pone.0340719.t008

Fig 13a highlights several classification errors observed in baseline models. For instance, a motor is misclassified as a voltmeter, and a voltmeter is incorrectly detected as an ammeter. Additionally, a voltmeter near a dense wiring junction is entirely missed, illustrating a failure to detect symbols in cluttered or overlapping regions. These errors are common in dense layouts where symbols are closely spaced and visually similar, leading to confusion in feature extraction and bounding box assignment. In contrast, the proposed model, shown in Fig 13b, demonstrates improved robustness by accurately classifying and localizing symbols with minimal errors, effectively addressing challenges in dense regions through enhanced attention and proximity-aware learning.

Download:

Fig 13. Predicted images from top-performing model (a) vs. the proposed model (b).

https://doi.org/10.1371/journal.pone.0340719.g013

Cross-dataset evaluation

To evaluate the robustness and generalization of the proposed ESC YOLOv8 model, a cross-dataset validation was performed using an independent set of 1200 SLDs images sourced from Government College University Hyderabad, Pakistan. The images, covering the same nine symbol classes with diverse layouts and complexities, were ethically acquired, preprocessed, annotated, and augmented to 3600 images to address class imbalance and ensure representativeness. The dataset was split into 80 percent training, 10 percent validation, and 10 percent testing sets. Benchmarking on this dataset with YOLOv8, YOLOv9, and YOLOv10 showed performance drops compared to the original dataset, while ESC YOLOv8 consistently outperformed these baselines, achieving the highest mAP along with balanced precision and recall, as shown in Table 9. These results confirm that the HRAM and the Proximity-aware Loss Function enhance model robustness and reliability on unseen data, reducing risks of overfitting and dataset specific bias, and supporting deployment in real-world industrial applications.

Download:

Table 9. Cross-dataset validation results on new SLD images.

https://doi.org/10.1371/journal.pone.0340719.t009

As shown in Table 9, ESC-YOLOv8 achieved the best performance across all evaluation metrics during cross-dataset validation. While YOLOv8, YOLOv9, and YOLOv10 demonstrated competitive accuracy, their performance dropped compared to the original dataset, indicating sensitivity to dataset variations. In contrast, ESC YOLOv8 maintained higher stability, recording 92.8 percent F1, 94.1 percent recall, and 91.7 percent mAP. These results confirm that the integration of the HRAM and the Proximity-aware Loss Function improves robustness and generalization, enabling the model to perform reliably on unseen data from different sources.

Computational efficiency analysis

The computational efficiency of ESC-YOLOv8 was evaluated against the strongest baseline models identified in the benchmarking study, namely YOLOv8, YOLOv9, and YOLOv10, which demonstrated the highest overall performance in Section 4.1. As presented in Table 10, ESC-YOLOv8 achieves a 19.6 percent reduction in parameters compared to the baseline while sustaining competitive inference speed. This parameter reduction is further illustrated in Fig 12, which highlights the comparative efficiency between the baseline YOLOv8 and the proposed ESC-YOLOv8. Collectively, these results demonstrate that the proposed model maintains a favorable balance between accuracy and efficiency, reinforcing its suitability for deployment in industrial environments where both precision and resource optimization are critical.

Download:

Table 10. Computational efficiency comparison of baseline models and ESC YOLOv8.

https://doi.org/10.1371/journal.pone.0340719.t010

Error patterns were further examined using the confusion matrices in Fig 11 and qualitative visualizations in Fig 13. The main misclassifications were observed between visually similar classes such as voltmeter and motor, and in cases of occlusion by dense connection lines. These errors highlight the challenges of symbol overlap and low contrast, which may be mitigated in future work by incorporating higher resolution patches and advanced preprocessing.

Conclusion

This research proposes ESC-YOLOv8 to enhance the semantic understanding of SLDs through symbol classification using a HRAM and a Proximity-aware Loss Function, which refine feature extraction and improve symbol localization, particularly in densely packed regions. Benchmarking results show a mAP of 93.5%, surpassing YOLOv8’s 89.7%, while reducing model parameters from 11.2 million to 9 million. Cross-dataset evaluations further demonstrate the model’s robustness, achieving 92.8% F1, 94.1% recall, and 91.7% mAP on unseen datasets, whereas YOLOv8, YOLOv9, and YOLOv10 experienced performance drops, confirming that the proposed enhancements improve generalization and resilience to dataset variations. Despite these improvements, limitations remain, including potential performance drops with highly diverse or rare symbol types, sensitivity to real-world diagram noise, annotation inconsistencies, and challenges in scaling to very large diagrams or real-time edge deployment. Future research can address these gaps by expanding the dataset with more diverse diagrams, integrating transformer- or graph-based architectures for improved relational understanding, developing lightweight or adaptive models for scalability and edge applications, and exploring automated annotation methods and advanced cross-domain evaluations to further enhance robustness and generalization.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work the author(s) used Large Language Model (LLM) in order to acquire better literature and understanding of existing methods. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Acknowledgments

This research was funded by Universiti Teknologi PETRONAS (UTP) under the YUTP-PRG (015PBC-070) research grant scheme, with additional funding and institutional support provided by the Faculty of Information and Communication Technology (FICT), Mahidol University.

References

1. Fleuret F, Li T, Dubout C, Wampler EK, Yantis S, Geman D. Comparing machines and humans on a visual categorization test. Proc Natl Acad Sci U S A. 2011;108(43):17621–5. pmid:22006295
- View Article
- PubMed/NCBI
- Google Scholar
2. Bhanbhro H, Kwang Hooi Y, Kusakunniran W, Amur ZH. A Symbol Recognition System for Single-Line Diagrams Developed Using a Deep-Learning Approach. Applied Sciences. 2023;13(15):8816.
- View Article
- Google Scholar
3. Bhanbhro H, et al. Modern deep learning approaches for symbol detection in complex engineering drawings. In: Proc 2022 Int Conf Digit Transform Intell (ICDI); 2022.
- View Article
- Google Scholar
4. Zhao Z-Q, Zheng P, Xu S-T, Wu X. Object Detection With Deep Learning: A Review. IEEE Trans Neural Netw Learn Syst. 2019;30(11):3212–32. pmid:30703038
- View Article
- PubMed/NCBI
- Google Scholar
5. Love PED, Zhou J, Matthews J. Systems information modeling: From file exchanges to model sharing for electrical instrumentation and control systems. Automation in Construction. 2016;67:48–59.
- View Article
- Google Scholar
6. American Society of Mechanical Engineers. The state of mechanical engineering: Today and beyond. New York: ASME; 2012. Available from: https://www.asme.org/wwwasmeorg/media/resourcefiles/campaigns/marketing/2012/the-state-of-mechanical-engineering-survey.pdf
7. Intelligent Project Solutions. Transforming legacy drawings into digital assets: Overcoming industry challenges. 2024. Available from: https://ips-ai.com/resource-centre/blogs/transforming-legacy-drawings-into-digital-assets-overcoming-industry-challenges/
8. Moreno-García CF, Elyan E, Jayne C. New trends on digitisation of complex engineering drawings. Neural Comput & Applic. 2018;31(6):1695–712.
- View Article
- Google Scholar
9. Mani S, Dubey SR, Singh SK. Automatic digitization of engineering diagrams using deep learning and graph search. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit Workshops (CVPRW); 2020. p. 904–5.
- View Article
- Google Scholar
10. Bhanbhro H, Hooi YK, Hassan Z. Modern approaches towards object detection of complex engineering drawings. In: Proc 2022 Int Conf Digit Transform Intell (ICDI); 2022.
- View Article
- Google Scholar
11. Elyan E, Jamieson L, Ali-Gombe A. Deep learning for symbols detection and classification in engineering drawings. Neural Netw. 2020;129:91–102. pmid:32502800
- View Article
- PubMed/NCBI
- Google Scholar
12. Jamieson L, Francisco Moreno-García C, Elyan E. A review of deep learning methods for digitisation of complex documents and engineering diagrams. Artif Intell Rev. 2024;57(6).
- View Article
- Google Scholar
13. Tahir MA, Bouridane A, Kurugollu F. Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier. Pattern Recognition Letters. 2007;28(4):438–46.
- View Article
- Google Scholar
14. Mitterbaur M. A data-driven approach to identifying spare parts suitable for additive manufacturing through the digitization of legacy engineering drawings. 2023. Available from: https://repositum.tuwien.at/handle/20.500.12708/188145
15. Mohd Yazed MS, Ahmad Shaubari EF, Yap MH. A Review of Neural Network Approach on Engineering Drawing Recognition and Future Directions. JOIV : Int J Inform Visualization. 2023;7(4):2513.
- View Article
- Google Scholar
16. Cootes TF, Taylor CJ, Cooper DH, Graham J. Active Shape Models-Their Training and Application. Computer Vision and Image Understanding. 1995;61(1):38–59.
- View Article
- Google Scholar
17. Liu J, Udupa JK. Oriented active shape models. IEEE Trans Med Imaging. 2009;28(4):571–84. pmid:19336277
- View Article
- PubMed/NCBI
- Google Scholar
18. Çiçek S, Ferikoğlu A, Pehlivan İ. A new 3D chaotic system: Dynamical analysis, electronic circuit design, active control synchronization and chaotic masking communication application. Optik. 2016;127(8):4024–30.
- View Article
- Google Scholar
19. Li Y, Wang X, Zhang Z. Deep learning-based symbol recognition in technical drawings: A case study on single-line diagrams. IEEE Trans Pattern Anal Mach Intell. 2020;42(8):1567–80.
- View Article
- Google Scholar
20. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137-49.
- View Article
- Google Scholar
21. Zhang Y, Li X, Wang H. CNN-based symbol recognition in electrical schematics: challenges and solutions. IEEE Trans Ind Informat. 2020;16(5):3456–65.
- View Article
- Google Scholar
22. Kim J, Park S, Lee T. Real-time symbol detection in single-line diagrams using YOLO. IEEE Access. 2020;8:123456–65.
- View Article
- Google Scholar
23. Liu X, Chen Y, Wang Z. Symbol classification in P&IDs: A deep learning approach. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2021;51(4):2345–55.
- View Article
- Google Scholar
24. Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified, real-time object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR); 2016. p. 779–88.
- View Article
- Google Scholar
25. Liu W, et al. SSD: Single Shot MultiBox Detector. IEEE Trans Pattern Anal Mach Intell. 2018;40(4):835–47.
- View Article
- Google Scholar
26. Tan M, Le Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In: Proc Int Conf Mach Learn (ICML); 2019. p. 6105–14.
27. Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv [Preprint]. 2018 Apr. Available from: https://arxiv.org/abs/1804.02767
- View Article
- Google Scholar
28. Kumar A, Gupta S, Singh R. Challenges and future directions in deep learning-based symbol classification for engineering drawings. IEEE Trans Neural Netw Learn Syst. 2022;33(5):2101–12.
- View Article
- Google Scholar
29. Dosovitskiy A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc Adv Neural Inf Process Syst (NeurIPS); 2021. p. 1–12.
- View Article
- Google Scholar
30. Qureshi AM, Abdul Haleem B, Abdulwahab A, Naif Al M, Mohammad A, Nouf Abdullah A, et al. Semantic Segmentation and YOLO Detector over Aerial Vehicle Images. Computers, Materials & Continua. 2024;80(2).
- View Article
- Google Scholar
31. Wang X, Song X, Li Z, Wang H. YOLO-DBS: Efficient Target Detection in Complex Underwater Scene Images Based on Improved YOLOv8. J Ocean Univ China. 2025;24(4):979–92.
- View Article
- Google Scholar
32. Zhou J, et al. Graph neural networks: A review of methods and applications. IEEE Trans Neural Netw Learn Syst. 2021;32(1):4–24.
- View Article
- Google Scholar
33. Bento J, Paixão T, Alvarez AB. Performance Evaluation of YOLOv8, YOLOv9, YOLOv10, and YOLOv11 for Stamp Detection in Scanned Documents. Applied Sciences. 2025;15(6):3154.
- View Article
- Google Scholar
34. Wang Y, Chen X, Liu Z. Contextual loss for improved symbol recognition in technical drawings. IEEE Trans Pattern Anal Mach Intell. 2022;44(8):4567–78.
- View Article
- Google Scholar
35. Akhtar MU. Missing link prediction in complex networks. Int J Sci Eng Res. 2018;9:82–7.
- View Article
- Google Scholar
36. Cheng T, et al. YOLO-World: Real-time open-vocabulary object detection. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit (CVPR); 2024.
- View Article
- Google Scholar
37. Mashhadi E, Ahmadvand H, Hemmati H. Method-level bug severity prediction using source code metrics and LLMs. In: Proc IEEE Int Symp Softw Rel Eng (ISSRE); 2023.
- View Article
- Google Scholar
38. Muzammul M, Li X. Comprehensive review of deep learning-based tiny object detection: challenges, strategies, and future directions. Knowl Inf Syst. 2025;67(5):3825–913.
- View Article
- Google Scholar
39. Chen S, Liu Y, Yang M. Adaptive loss functions for improved symbol recognition in complex engineering drawings. IEEE Trans Neural Netw Learn Syst. 2023;34(6):1234–45.
- View Article
- Google Scholar
40. Ahmadvand H, Goudarzi M, Foroutan F. Gapprox: using Gallup approach for approximation in Big Data processing. J Big Data. 2019;6(1).
- View Article
- Google Scholar
41. Bhanbhro H, Hooi YK, Zakaria MNB, Hassan Z, Pitafi S. Single-line electrical drawings (SLED): A multiclass dataset benchmarked by deep neural networks. In: Proc IEEE 13th Int Conf Syst Eng Technol (ICSET); 2023. p. 66–71.
- View Article
- Google Scholar
42. Ikram S, Sarwar Bajwa I, Gyawali S, Ikram A, Alsubaie N. Enhancing Object Detection in Assistive Technology for the Visually Impaired: A DETR-Based Approach. IEEE Access. 2025;13:71647–61.
- View Article
- Google Scholar
43. Liu J, Jiang G, Chu C, Li Y, Wang Z, Hu S. A formal model for multiagent Q-learning on graphs. Sci China Inf Sci. 2025;68(9).
- View Article
- Google Scholar
44. Bhanbhro H, et al. Symbol detection in a multi-class dataset based on single-line diagrams using deep learning models. Int J Adv Comput Sci Appl. 2023;14(8).
- View Article
- Google Scholar
45. Moorthy S, et al. Hybrid multi-attention transformer for robust video object detection. Eng Appl Artif Intell. 2025;139:109606.
- View Article
- Google Scholar
46. Huang Z, Shen Y, Zhou M, Chen M, Yang H, Li S, et al. High spatial resolution infrared measurement method for transient temperature field based on 3D-SwinIR super-resolutions. Rev Sci Instrum. 2025;96(4):045107. pmid:40261104
- View Article
- PubMed/NCBI
- Google Scholar
47. Goh KW, Surono S, Afiatin MF, Mahmudah KR, Irsalinda N, Chaimanee M, et al. Comparison of activation functions in convolutional neural network for poisson noisy image classification. Emerg Sci J. 2024;8(2):592–602.
- View Article
- Google Scholar
48. Worachairungreung M, Kulpanich N, Sae-ngow P, Thanakunwutthirot K, Anurak K, Hemwan P. Classification of Coconut Trees Within Plantations from UAV Images Using Deep Learning with Faster R-CNN and Mask R-CNN. J Hum Earth Future. 2024;5(4):560–73.
- View Article
- Google Scholar
49. Alhawsawi AN, Khan SD, Rehman FU. Enhanced YOLOv8-Based Model with Context Enrichment Module for Crowd Counting in Complex Drone Imagery. Remote Sensing. 2024;16(22):4175.
- View Article
- Google Scholar
50. Khan SD, Alarabi L, Basalamah S. A unified deep learning framework of multi-scale detectors for geo-spatial object detection in high-resolution satellite images. Arab J Sci Eng. 2022;47(8):9489–504.
- View Article
- Google Scholar
51. He K, Gkioxari G, Dollar P, Girshick R. Mask R-CNN. IEEE Trans Pattern Anal Mach Intell. 2020;42(2):386–97. pmid:29994331
- View Article
- PubMed/NCBI
- Google Scholar
52. Zhao H, Ji T, Rosin PL, Lai Y-K, Meng W, Wang Y. Cross-lingual font style transfer with full-domain convolutional attention. Pattern Recognition. 2024;155:110709.
- View Article
- Google Scholar
53. Wang J, Zhang L, Li H. Challenges in symbol classification for cluttered engineering drawings: A study on loss function limitations. IEEE Trans Image Process. 2022;31:5678–90.
- View Article
- Google Scholar
54. Guan S, Lin Y, Lin G, Su P, Huang S, Meng X, et al. Real-Time Detection and Counting of Wheat Spikes Based on Improved YOLOv10. Agronomy. 2024;14(9):1936.
- View Article
- Google Scholar
55. Bhanbhro H, Hooi YK, Zakaria MNB, Kusakunniran W, Amur ZH. MCBAN: A Small Object Detection Multi-Convolutional Block Attention Network. CMC. 2024;81(2):2243–59.
- View Article
- Google Scholar

[ref1] 1. Fleuret F, Li T, Dubout C, Wampler EK, Yantis S, Geman D. Comparing machines and humans on a visual categorization test. Proc Natl Acad Sci U S A. 2011;108(43):17621–5. pmid:22006295
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Bhanbhro H, Kwang Hooi Y, Kusakunniran W, Amur ZH. A Symbol Recognition System for Single-Line Diagrams Developed Using a Deep-Learning Approach. Applied Sciences. 2023;13(15):8816.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Bhanbhro H, et al. Modern deep learning approaches for symbol detection in complex engineering drawings. In: Proc 2022 Int Conf Digit Transform Intell (ICDI); 2022.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Zhao Z-Q, Zheng P, Xu S-T, Wu X. Object Detection With Deep Learning: A Review. IEEE Trans Neural Netw Learn Syst. 2019;30(11):3212–32. pmid:30703038
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref5] 5. Love PED, Zhou J, Matthews J. Systems information modeling: From file exchanges to model sharing for electrical instrumentation and control systems. Automation in Construction. 2016;67:48–59.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref6] 6. American Society of Mechanical Engineers. The state of mechanical engineering: Today and beyond. New York: ASME; 2012. Available from: https://www.asme.org/wwwasmeorg/media/resourcefiles/campaigns/marketing/2012/the-state-of-mechanical-engineering-survey.pdf

[ref7] 7. Intelligent Project Solutions. Transforming legacy drawings into digital assets: Overcoming industry challenges. 2024. Available from: https://ips-ai.com/resource-centre/blogs/transforming-legacy-drawings-into-digital-assets-overcoming-industry-challenges/

[ref8] 8. Moreno-García CF, Elyan E, Jayne C. New trends on digitisation of complex engineering drawings. Neural Comput & Applic. 2018;31(6):1695–712.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref9] 9. Mani S, Dubey SR, Singh SK. Automatic digitization of engineering diagrams using deep learning and graph search. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit Workshops (CVPRW); 2020. p. 904–5.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref10] 10. Bhanbhro H, Hooi YK, Hassan Z. Modern approaches towards object detection of complex engineering drawings. In: Proc 2022 Int Conf Digit Transform Intell (ICDI); 2022.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref11] 11. Elyan E, Jamieson L, Ali-Gombe A. Deep learning for symbols detection and classification in engineering drawings. Neural Netw. 2020;129:91–102. pmid:32502800
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref12] 12. Jamieson L, Francisco Moreno-García C, Elyan E. A review of deep learning methods for digitisation of complex documents and engineering diagrams. Artif Intell Rev. 2024;57(6).
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref13] 13. Tahir MA, Bouridane A, Kurugollu F. Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier. Pattern Recognition Letters. 2007;28(4):438–46.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref14] 14. Mitterbaur M. A data-driven approach to identifying spare parts suitable for additive manufacturing through the digitization of legacy engineering drawings. 2023. Available from: https://repositum.tuwien.at/handle/20.500.12708/188145

[ref15] 15. Mohd Yazed MS, Ahmad Shaubari EF, Yap MH. A Review of Neural Network Approach on Engineering Drawing Recognition and Future Directions. JOIV : Int J Inform Visualization. 2023;7(4):2513.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref16] 16. Cootes TF, Taylor CJ, Cooper DH, Graham J. Active Shape Models-Their Training and Application. Computer Vision and Image Understanding. 1995;61(1):38–59.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref17] 17. Liu J, Udupa JK. Oriented active shape models. IEEE Trans Med Imaging. 2009;28(4):571–84. pmid:19336277
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref18] 18. Çiçek S, Ferikoğlu A, Pehlivan İ. A new 3D chaotic system: Dynamical analysis, electronic circuit design, active control synchronization and chaotic masking communication application. Optik. 2016;127(8):4024–30.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref19] 19. Li Y, Wang X, Zhang Z. Deep learning-based symbol recognition in technical drawings: A case study on single-line diagrams. IEEE Trans Pattern Anal Mach Intell. 2020;42(8):1567–80.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref20] 20. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137-49.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref21] 21. Zhang Y, Li X, Wang H. CNN-based symbol recognition in electrical schematics: challenges and solutions. IEEE Trans Ind Informat. 2020;16(5):3456–65.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref22] 22. Kim J, Park S, Lee T. Real-time symbol detection in single-line diagrams using YOLO. IEEE Access. 2020;8:123456–65.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref23] 23. Liu X, Chen Y, Wang Z. Symbol classification in P&IDs: A deep learning approach. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2021;51(4):2345–55.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref24] 24. Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified, real-time object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR); 2016. p. 779–88.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref25] 25. Liu W, et al. SSD: Single Shot MultiBox Detector. IEEE Trans Pattern Anal Mach Intell. 2018;40(4):835–47.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref26] 26. Tan M, Le Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In: Proc Int Conf Mach Learn (ICML); 2019. p. 6105–14.

[ref27] 27. Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv [Preprint]. 2018 Apr. Available from: https://arxiv.org/abs/1804.02767
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref28] 28. Kumar A, Gupta S, Singh R. Challenges and future directions in deep learning-based symbol classification for engineering drawings. IEEE Trans Neural Netw Learn Syst. 2022;33(5):2101–12.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref29] 29. Dosovitskiy A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc Adv Neural Inf Process Syst (NeurIPS); 2021. p. 1–12.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref30] 30. Qureshi AM, Abdul Haleem B, Abdulwahab A, Naif Al M, Mohammad A, Nouf Abdullah A, et al. Semantic Segmentation and YOLO Detector over Aerial Vehicle Images. Computers, Materials & Continua. 2024;80(2).
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref31] 31. Wang X, Song X, Li Z, Wang H. YOLO-DBS: Efficient Target Detection in Complex Underwater Scene Images Based on Improved YOLOv8. J Ocean Univ China. 2025;24(4):979–92.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref32] 32. Zhou J, et al. Graph neural networks: A review of methods and applications. IEEE Trans Neural Netw Learn Syst. 2021;32(1):4–24.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref33] 33. Bento J, Paixão T, Alvarez AB. Performance Evaluation of YOLOv8, YOLOv9, YOLOv10, and YOLOv11 for Stamp Detection in Scanned Documents. Applied Sciences. 2025;15(6):3154.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref34] 34. Wang Y, Chen X, Liu Z. Contextual loss for improved symbol recognition in technical drawings. IEEE Trans Pattern Anal Mach Intell. 2022;44(8):4567–78.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref35] 35. Akhtar MU. Missing link prediction in complex networks. Int J Sci Eng Res. 2018;9:82–7.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref36] 36. Cheng T, et al. YOLO-World: Real-time open-vocabulary object detection. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit (CVPR); 2024.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref37] 37. Mashhadi E, Ahmadvand H, Hemmati H. Method-level bug severity prediction using source code metrics and LLMs. In: Proc IEEE Int Symp Softw Rel Eng (ISSRE); 2023.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref38] 38. Muzammul M, Li X. Comprehensive review of deep learning-based tiny object detection: challenges, strategies, and future directions. Knowl Inf Syst. 2025;67(5):3825–913.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref39] 39. Chen S, Liu Y, Yang M. Adaptive loss functions for improved symbol recognition in complex engineering drawings. IEEE Trans Neural Netw Learn Syst. 2023;34(6):1234–45.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref40] 40. Ahmadvand H, Goudarzi M, Foroutan F. Gapprox: using Gallup approach for approximation in Big Data processing. J Big Data. 2019;6(1).
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref41] 41. Bhanbhro H, Hooi YK, Zakaria MNB, Hassan Z, Pitafi S. Single-line electrical drawings (SLED): A multiclass dataset benchmarked by deep neural networks. In: Proc IEEE 13th Int Conf Syst Eng Technol (ICSET); 2023. p. 66–71.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref42] 42. Ikram S, Sarwar Bajwa I, Gyawali S, Ikram A, Alsubaie N. Enhancing Object Detection in Assistive Technology for the Visually Impaired: A DETR-Based Approach. IEEE Access. 2025;13:71647–61.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref43] 43. Liu J, Jiang G, Chu C, Li Y, Wang Z, Hu S. A formal model for multiagent Q-learning on graphs. Sci China Inf Sci. 2025;68(9).
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref44] 44. Bhanbhro H, et al. Symbol detection in a multi-class dataset based on single-line diagrams using deep learning models. Int J Adv Comput Sci Appl. 2023;14(8).
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref45] 45. Moorthy S, et al. Hybrid multi-attention transformer for robust video object detection. Eng Appl Artif Intell. 2025;139:109606.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref46] 46. Huang Z, Shen Y, Zhou M, Chen M, Yang H, Li S, et al. High spatial resolution infrared measurement method for transient temperature field based on 3D-SwinIR super-resolutions. Rev Sci Instrum. 2025;96(4):045107. pmid:40261104
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref47] 47. Goh KW, Surono S, Afiatin MF, Mahmudah KR, Irsalinda N, Chaimanee M, et al. Comparison of activation functions in convolutional neural network for poisson noisy image classification. Emerg Sci J. 2024;8(2):592–602.
View Article
Google Scholar

[137] View Article

[138] Google Scholar

[ref48] 48. Worachairungreung M, Kulpanich N, Sae-ngow P, Thanakunwutthirot K, Anurak K, Hemwan P. Classification of Coconut Trees Within Plantations from UAV Images Using Deep Learning with Faster R-CNN and Mask R-CNN. J Hum Earth Future. 2024;5(4):560–73.
View Article
Google Scholar

[140] View Article

[141] Google Scholar

[ref49] 49. Alhawsawi AN, Khan SD, Rehman FU. Enhanced YOLOv8-Based Model with Context Enrichment Module for Crowd Counting in Complex Drone Imagery. Remote Sensing. 2024;16(22):4175.
View Article
Google Scholar

[143] View Article

[144] Google Scholar

[ref50] 50. Khan SD, Alarabi L, Basalamah S. A unified deep learning framework of multi-scale detectors for geo-spatial object detection in high-resolution satellite images. Arab J Sci Eng. 2022;47(8):9489–504.
View Article
Google Scholar

[146] View Article

[147] Google Scholar

[ref51] 51. He K, Gkioxari G, Dollar P, Girshick R. Mask R-CNN. IEEE Trans Pattern Anal Mach Intell. 2020;42(2):386–97. pmid:29994331
View Article
PubMed/NCBI
Google Scholar

[149] View Article

[150] PubMed/NCBI

[151] Google Scholar

[ref52] 52. Zhao H, Ji T, Rosin PL, Lai Y-K, Meng W, Wang Y. Cross-lingual font style transfer with full-domain convolutional attention. Pattern Recognition. 2024;155:110709.
View Article
Google Scholar

[153] View Article

[154] Google Scholar

[ref53] 53. Wang J, Zhang L, Li H. Challenges in symbol classification for cluttered engineering drawings: A study on loss function limitations. IEEE Trans Image Process. 2022;31:5678–90.
View Article
Google Scholar

[156] View Article

[157] Google Scholar

[ref54] 54. Guan S, Lin Y, Lin G, Su P, Huang S, Meng X, et al. Real-Time Detection and Counting of Wheat Spikes Based on Improved YOLOv10. Agronomy. 2024;14(9):1936.
View Article
Google Scholar

[159] View Article

[160] Google Scholar

[ref55] 55. Bhanbhro H, Hooi YK, Zakaria MNB, Kusakunniran W, Amur ZH. MCBAN: A Small Object Detection Multi-Convolutional Block Attention Network. CMC. 2024;81(2):2243–59.
View Article
Google Scholar

[162] View Article

[163] Google Scholar

Figures

Abstract

Introduction

Related works

Traditional methods for symbol classification.

Deep learning-based and transformer-based symbol classification.

Methods

Novel dataset development

Data exploring & preprocessing guidelines.

Class distribution.

Model benchmarking methodology

Hyperparameter configurations.

Hardware & software setup.

Proposed deep learning model for symbol classification

Model-1: Hybrid residual attention module.

Model 2- proximity-aware loss function (PaL).

Proposed algorithm: ESC-YOLOv8 workflow.

Performance assessment of the model

Results

Model benchmarking results

Proposed model results

Ablation study.

Cross-dataset evaluation

Computational efficiency analysis

Conclusion

Declaration of generative AI and AI-assisted technologies in the writing process

Acknowledgments

References