Figures
Abstract
As key power system equipment, real-time monitoring and accurate assessment of power transformers’ operating status are critical for grid safety. To address deficiencies in existing methods regarding multi-source heterogeneous data fusion, unstructured information processing, and small-sample fault recognition, this paper proposes a multi-source deep feature extraction and fusion method combining two-dimensional convolutional neural networks (2D-CNN) and bidirectional long short-term memory networks (BiLSTM) for online transformer status monitoring. A multi-channel sensor platform collects key variables (e.g., partial discharge acoustic signals, voltage, current, oil-dissolved gas concentration) to acquire and standardize spatial and temporal features. A dual-channel model is designed: 2D-CNN extracts spatial features from images, BiLSTM captures temporal dependencies, with an attention mechanism weighting and fusing these features, followed by a fully connected layer. A Softmax classifier with ensemble learning performs state discrimination to enhance stability and generalization. Experiments using real data from east China power grid Dongwu Station transformers for classifying five typical states show the method outperforms traditional and single-modal deep models in accuracy, achieving effective transformer status monitoring.
Citation: Zhang W, Liu L, Chen C, Wei S, Zhang Y, Huo Z, et al. (2026) Power transformer online monitoring and state assessment method based on 2DCNN-BiLSTM multi-source feature fusion. PLoS One 21(4): e0345949. https://doi.org/10.1371/journal.pone.0345949
Editor: Shaofei Wu, Wuhan Institute of Technology, CHINA
Received: November 10, 2025; Accepted: March 12, 2026; Published: April 20, 2026
Copyright: © 2026 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files.
Funding: This work was supported by Technology Project of Research and Application of Transformer Panoramic Monitoring and Operation Status Evaluation for State Grid Chengde Luanping County Power Supply Company in 2025 (Grant No. B3016625Z006).
Competing interests: The authors have declared that no competing interests exist.
1 Introductions
Transformers play a crucial role in the operation of power systems, affecting the safety and reliability of power supply. Once a transformer fails, it can affect the safe operation of the power system and even lead to widespread power outages. With the continuous increase in demand for electricity and the steady growth of electricity load in current society, the reliable operation of transformers plays a crucial role in the power supply reliability of the power system. Therefore, achieving online monitoring and assessment of transformer status is of great significance for preventing power system failures and ensuring the reliability of power supply. Traditional transformer condition monitoring often relies on a single sensor to collect specific physical quantities, such as temperature, voltage, current, and dissolved gas content, and uses empirical formulas or thresholds to determine the condition of the transformer. However, there are problems such as low information utilization and low accuracy in judgment [1]. In recent years, with the rapid development of sensor technology and information technology, transformer online monitoring systems have gradually achieved synchronous collection of multi-source data [2], including heterogeneous data such as temperature sensing, electrical parameter measurement, dissolved gas analysis (DGA), infrared thermal imaging, and voiceprint signals [3,4]. The fusion of multi-source data provides technical support for in-depth exploration of the intrinsic correlation of transformer operation status. Deep learning has demonstrated outstanding feature learning capabilities in fields such as image recognition, speech processing, and time-series data analysis.
At present, scholars have made significant progress in the field of transformer condition assessment in multiple aspects. In terms of online monitoring technology, Manoj et al. [5] combined the Analytic Hierarchy Process with Grey Relational Analysis to construct a fuzzy assessment model for transformer health status, effectively addressing the uncertainty issues in the assessment process. Schiewaldt et al. [6] evaluated the application effect of UHF frequency range in transformer fault classification, providing a theoretical basis for optimizing partial discharge detection technology. Liu et al. [7] combined Raman spectroscopy technology and optimized support vector machines to establish an assessment method for the aging state of transformer oil paper insulation, achieving accurate judgment of the degree of insulation system degradation. In terms of deep learning applications, researchers have introduced advanced neural network technology into the field of transformer state monitoring. Wang et al. [8] proposed interactive transformers and CNN networks for fusion classification of hyperspectral and LiDAR data, demonstrating the advantages of deep learning in multimodal data processing. Lu et al. [9] achieved degradation trend measurement of aircraft engine feature hierarchy fusion based on parameter adaptive VMD method and improved transformer model. Liu et al. [10] proposed a digital twin assembly model based on visual transformers for multi-source mutation fusion, providing a new approach for complex system state monitoring. Shang et al. [11] combined SGMD approximate entropy feature extraction technology with optimized BiLSTM network to construct a partial discharge fault diagnosis model for transformers, significantly improving the accuracy of fault identification. Liu et al. [12] proposed a dual transformer BiLSTM network, which improves the robustness of speech features through a dual transformer structure and provides a new method for speech emotion recognition. Wang et al. [13] used LSTM model for dynamic early fault prediction of power transformers, achieving early warning of faults. Guangliang et al. [14] demonstrated the advantages of temporal modeling by using transformers and stacked Bi LSTM for multi-channel and multi-step spectrum prediction. Pentsos et al. [15] proposed a hybrid LSTM transformer model for power load forecasting and verified the effectiveness of the hybrid architecture. Mitiche et al. [16] used LSTM autoencoders for data-driven anomaly detection in high-voltage transformer bushings, providing a new technological means for equipment health monitoring.
In the research of transformer state assessment methods, scholars have promoted technological innovation from multiple perspectives. The assessment method based on subsystem measurement improves the accuracy and interpretability of assessment through data and knowledge fusion [17]. The fuzzy assessment model combining Analytic Hierarchy Process and Grey Relational Analysis effectively addresses uncertainty issues in assessment [18,19]. The mixed failure mode analysis framework has achieved the quantification of component risk priority in a hesitant fuzzy environment [20], providing a scientific basis for maintenance strategies. However, existing methods still have shortcomings in multi-source heterogeneous data fusion, unstructured information processing, and small sample fault recognition, and cannot effectively utilize multi-source heterogeneous data for transformer operation status assessment, with low accuracy.
In response to the above issues, this paper proposes a multi-source feature fusion transformer online monitoring and state assessment method based on dimensional convolutional neural networks (2D-CNN) and bidirectional long short-term memory networks (BiLSTM). In Section 2, a multi-sensor data acquisition system was constructed to provide synchronous acquisition and processing methods for multi-source data. In Section 3, a feature extraction model for 2DCNN-BiLSTM was designed, which can extract spatial and temporal features. In Section 4, a transformer state assessment model is proposed, which utilizes a fully connected layer combined with attention mechanism to achieve efficient fusion of multi-source features, effectively solving the problem of small samples, and using Softmax layer to complete classification tasks. In Section 5, by obtaining operational data of 1000kV transformers and comparing and analyzing the assessment effect of transformer health status using various methods, the effectiveness and accuracy of the methods proposed in the article were verified. Finally, concluding remarks are mentioned in Section 6.
2 Multi-sensor state monitoring system
2.1 Transformer structure and common types of faults
As an important equipment in the power system, transformers are mainly composed of iron cores, windings, insulation systems, oil tanks, cooling systems, tap changers, and bushings. Transformers may malfunction during operation due to their own or environmental factors (as shown in Fig 1), leading to system abnormalities or failures.
Transformers face various environmental and operational changes during operation, and are prone to typical types of faults including winding failures, insulation degradation, abnormal iron cores, oil degradation, and cooling system failures. Winding faults are usually caused by overload, electrical shock, or insulation failure, manifested as local overheating, inter turn short circuits, and an increase in combustible gas concentration in the oil. Insulation systems are prone to performance degradation under long-term thermal stress and humid environments, leading to a decrease in insulation resistance and an increase in dielectric loss factor, which can further cause partial discharge and even breakdown accidents. Iron core faults are mostly caused by structural looseness or abnormal grounding, resulting in local magnetic loss and increased heat generation, as well as significant noise. The quality change of insulating oil directly affects the cooling and electrical insulation performance of the system, and its deterioration is often accompanied by an increase in acid value, a decrease in breakdown voltage, and abnormal gas composition. Cooling system failures such as fan damage, oil pump failure, or blocked heat dissipation channels will seriously weaken the heat exchange capacity, causing an increase in oil temperature and system overheating.
Due to the fact that the above-mentioned types of faults often exhibit the characteristics of multi-source heterogeneity, it is difficult to distinguish them early, and a single physical quantity monitoring method is difficult to comprehensively reflect the operating status of the equipment. Therefore, establishing a state monitoring system based on multi-source sensing and integrating deep features is of great significance for improving the safety of transformer operation and the reliability of power system operation.
2.2 Architecture and composition of monitoring system
According to the functions of each module in the transformer system, this article divides the transformer status monitoring system into a three-layer monitoring architecture of “data acquisition layer data processing layer data analysis layer,” as shown in Fig 2.
Communication between layers is achieved through standardized interfaces, forming a complete information loop from low-level data collection to high-level intelligent analysis. The data collection layer transmits the operational status data of different components in the transformer monitored by multiple types of sensors to the data processing layer; The real-time preprocessing of multi-source data by the data processing layer significantly reduces data transmission pressure and improves system robustness; Finally, the data analysis layer used a hybrid architecture model of 2D-CNN and BiLSTM to extract and fuse key features of multimodal features, achieving accurate recognition and prediction of transformer states.
3 Feature extraction model based on 2D-CNN and Bi LSTM
This article proposes a deep hybrid feature extraction framework that combines two-dimensional convolutional neural networks (2D-CNN) and bidirectional long short-term memory networks (BiLSTM), and constructs an end-to-end multimodal state assessment model by combining fully connected layers and attention mechanisms.
3.1 Design of 2D-CNN spatial feature extraction module
In the multi-source transformer monitoring system, the image data, including infrared thermal image and voiceprint spectrum, have typical two-dimensional spatial structure characteristics, which is suitable for using convolutional neural network to extract high-order representation features in local regions. In order to make full use of the spatial information of image data, this part constructs an image feature extraction module based on two-dimensional convolutional network (2D-CNN). The external diagram reflects the temperature distribution of windings, bushings, oil tanks and other parts, which is an important basis for identifying thermal faults. By processing the mechanical vibration signal with short-time Fourier transform (STFT) and Mel filter banks, the voiceprint pattern can obtain the time-frequency distribution characteristic map, which has the time-frequency joint characteristics and is suitable for depicting the peak of partial discharge or abnormal vibration mode.
Let the discrete acoustic signal be x[n], the window function be w[n], the window length be N, the sliding step be S. The STFT expression is as follows:
Where m denotes the index number of the current frame, Xa(m, k) represents the STFT complex result of the m-th frame at the k-th frequency, and j is the imaginary unit, where j2 = −1.
Calculate the square of the frequency spectrum modulus and take the logarithm to construct a logarithmic power spectrum, as shown below:
Among them, LogSpec(m, k) is the logarithmic power spectrum value of the m-th frame at the k-th frequency, and ε is a small quantity.
Subsequently, Mel filter banks were introduced to weight each frame of the spectrogram, obtaining a spectral response that is closer to human auditory perception. The response function of the m-th filter at frequency k is:
Where Hm(k) is the response value of the m-th filter at the k-th frequency, and f(m) is the center frequency corresponding to the m-th filter.
The final Mel spectrogram is obtained by weighting and accumulating the filters of each frequency band, as shown below:
Where Melspec(t, m) is the output result of the m-th Mel filter in the t-th frame, and Hm(k) is the weight coefficient of the m-th filter at frequency k.
The obtained two-dimensional image is the voiceprint image, which can be combined with the infrared image as an image input and sent to the 2D-CNN module for spatial feature extraction. All images are uniformly cropped to size H*W before input and normalized. The 2D-CNN network structure adopts a combination module of “convolutional layer activation function pooling layer” to extract local spatial features of the image layer by layer. The core convolution calculation is expressed as:
Where is the input image tensor,
represents the weight of the k-th convolution kernel in the l-th layer on the c-th channel,
is the bias term, K is the size of the convolution kernel, (i,j) is the image position coordinate, and σ(·) is the nonlinear activation function.
After convolution, pooling operation is performed to compress the spatial dimension, reduce computational complexity, and enhance the translation invariance of features. Finally, it is flattened into a one-dimensional vector , which serves as the spatial feature representation of the image channel for subsequent multimodal fusion modules. In addition, to improve the stability and generalization ability of the model training, a Batch Normalization layer is added after the convolutional layer to standardize the activation output, and Dropout is introduced in some layers to suppress overfitting.
3.2 BiLSTM time series modeling module
3.2.1 Principle of BiLSTM network structure.
In the intelligent sensing system for transformer status, time-series monitoring data such as winding temperature, voltage and current waveforms, and changes in gas concentration in oil contain dynamic evolution information of equipment operating status, with significant temporal correlation and long-term dependence structure. However, traditional feedforward neural networks are unable to effectively model such dynamic relationships, making it difficult to capture local abnormal fluctuations or non-stationary behavior. To this end, this article introduces a bidirectional long short-term memory network (BiLSTM) to deeply explore the evolution patterns and fault precursor features in temporal signals.
3.2.2 Temporal feature extraction mechanism.
Let the standardized time-series sample sequence be:
Where T is the length of the time series, Xt is the observation vector of the t-th time step, and the dimension is d.
The BiLSTM network performs bidirectional encoding on the processed time series, where the output of step t is represented as:
Where and
respectively represent the hidden state outputs of forward and backward LSTM,
represents the concatenation operation, and ht is the hidden unit dimension in each direction.
Where and
are trainable parameters, tanh is the hyperbolic tangent function, and the symbol ⊙ represents element wise multiplication.
3.3 Feature fusion and attention mechanism design
By using global average pooling to reduce feature dimensions and enhance the modeling ability of temporal global trends, the bidirectional hidden state outputs of all time steps are integrated to obtain the final temporal channel features, which are represented as follows:
Among them, each physical quantity is described in equations (6) and (7).
The evolution trend and dynamic dependency characteristics of the monitoring signal are effectively encoded through vector fseq, serving as a global embedding representation of the temporal channel, providing high-quality input for subsequent multimodal feature fusion and state recognition. After obtaining the image feature vector and temporal feature vector, it is necessary to perform fusion modeling to achieve multi-source information collaborative perception. This article adopts a feature level fusion strategy and uses concatenation operations to:
Where Ф(·) is the ReLU activation function, and Wf and bf are the fusion layer weights.
To enhance the responsiveness of the model to key features, this paper introduces channel attention mechanism in the feature fusion stage and designs a fusion based attention weighting mechanism combined with multi-source information structure. Specifically, assuming that the spatial feature vector extracted by 2D-CNN is and the temporal feature vector extracted by BiLSTM is
, the two are first concatenated to obtain a fused feature matrix:
Subsequently, a set of trainable channel attention weights is defined and normalized using the Softmax function to obtain the weighting coefficients for each fusion channel:
The weighted feature representation after final fusion is:
This channel attention mechanism can adaptively learn the importance of various feature channels during the training process, highlighting key dimensions and suppressing redundant features, thereby improving the classification accuracy and generalization ability of the model.
Further introduce multi head attention mechanism to extend the channel weighting process mentioned above. Each attention head learns different feature subspaces and independently calculates weighting coefficients, and finally obtains richer semantic expressions through concatenation operations. This strategy can effectively capture the interactive relationships between multi-scale and heterogeneous data, enhancing the expressive power of feature fusion and the interpretability of the model. The fused feature vectors will be input into a fully connected layer for nonlinear mapping, and ultimately implemented for state recognition through a Softmax classifier.
3.4 Design of state classifier and fault recognition module
The multimodal fusion features are fed into the fully connected layer and mapped to the class probability space output through the Softmax activation function:
Finally, the cross entropy loss function model is used to optimize the mapping from feature vectors to fault categories, achieving automatic recognition of operational status.
Formulas (14) and (15) represent the output layer and loss calculation of the classification model. Firstly, the output of the fully connected layer is converted into probability distributions for each category using the Softmax activation function. Then, the cross entropy loss function is used to measure the difference between the predicted probability and the true label. By minimizing this loss, the model parameters are optimized to achieve accurate classification prediction of transformer states.
3.5 Integrated algorithm
The integrated learning module uses the soft voting strategy to weighted average the classification output probabilities under multiple training rounds to avoid the impact of single prediction fluctuation on classification decision. The voting weight of each sub model is dynamically allocated based on its accuracy in the verification set, so as to improve the stability and robustness of the final judgment. Through three models with different initialization parameters, the category probability vectors of their respective outputs are weighted average:
Where ωi is the model weight. The final prediction category is determined by the label corresponding to the maximum probability, which is shown as follows:
The integrated algorithm can effectively reduce the bias and over fitting of individual models and improve the overall prediction performance of the model.
4 Transformer condition assessment model based on integrated algorithm
Based on the successful extraction of multimodal spatial and temporal features by 2D-CNN and BiLSTM modules, this chapter designs a state assessment model of multi-head attention mechanism and integrated learning strategies. Through the construction of end-to-end training process, the task adaptability loss function and integrated optimization mechanism are introduced to realize the efficient classification and intelligent perception of transformer operation state under multi-source input.
4.1 Model training process
The overall training process of the state evaluation model proposed in this paper is shown in Fig 3. It mainly includes the following four stages: ①multi source data input and preprocessing: obtain the information such as voiceprint map, gas concentration, voltage and current during the operation of the transformer, and construct the standard input quantity of image type and time sequence type through normalization, clipping, STFT conversion and other operations respectively. ②multimodal feature extraction: 2D-CNN module is used to extract image spatial features, BiLSTM module is used to extract temporal evolution features, and the fusion vector is constructed through feature stitching and attention mechanism. ③classifier training and integration optimization: input the fusion vector into multiple basic classifiers, construct multiple sub model outputs, and use the soft voting strategy to generate the final prediction label. ④model assessment and feedback adjustment: indicators such as cross validation and confusion matrix are introduced to assess the performance of the model, and the model structure and super parameter configuration are dynamically adjusted accordingly.
4.2 Loss function and algorithm optimization
4.2.1 Definition of classification loss function.
The essence of the state recognition task is a multi classification problem. The cross entropy loss function is defined as follows:
Where Z is the number of state categories, yZ∈{0,1} is the unique hot code of the actual tag, and is the prediction probability of the model for the k-th category.
To enhance the sensitivity to minority classes, the category weighting strategy is introduced to obtain the weighted cross entropy loss, which is expressed as follows:
Where wZ is the weight of the Z-th sample, which is set by the reciprocal of the sample frequency.
4.2.2 Optimizer and regularization strategy.
In order to improve the convergence efficiency and generalization ability of the model, Adam algorithm is selected in the training process, combined with first-order momentum and second-order correction to improve the gradient update efficiency. Add dropout after the full connection layer to prevent over fitting, and add L2 regular items, as shown below:
Where θ is the model parameter set and λ is the regularization strength superparameter.
In order to further improve the robustness and generalization performance of the model, this paper uses the soft voting method to classify the final state results of the transformer, and the final classification results take the category corresponding to the maximum probability.
4.3 Overall assessment framework
In order to comprehensively verify the effectiveness of the proposed 2D-CNN-BiLSTM multi-source feature fusion method in transformer condition assessment and ensure the scientificity and credibility of the experimental results, the assessment process constructed in this paper is shown in Fig 4.
5 Example analysis
5.1 Experimental dataset and sample construction
In order to verify the effectiveness of the multi-source deep fusion state assessment model proposed in this paper, this section selects a 1000 kV UHV main transformer in Dongwu substation of East China power grid as the test object to carry out multi-source online monitoring data acquisition and modeling experiments. The monitoring data collected include seven typical combustible gas concentrations, voltage and current signals and voiceprint data, such as dissolved gas in oil (DGA) data (H2, CH4, C2H2, C2H4, C2H₆, CO, CO2). In the related research of transformer condition monitoring, the literature provides a theoretical basis for DGA data analysis and feature extraction of current data [21–24]. Among them, Fig 5 shows the time sequence change curve of typical three-phase currents of transformers a, B and C. Fig 6 shows the amplitude distribution of vibration signal, reflecting the vibration intensity law during equipment operation.
A total of 3240 valid samples were collected in the experiment, and the category distribution was as follows: 1458 (45%) were in normal operation, 648 (20%) were in winding overheating, 486 (15%) were in insulation deterioration, 389 (12%) were in abnormal iron core, and 259 (8%) were in cooling system failure. The obvious abnormal data points are eliminated by using the 3σ criterion, and the missing values are completed by linear interpolation. Z-score standardization is carried out for each type of feature to ensure the comparability of different dimensional features. The sliding window technology is adopted. The window length is set to 64 time steps and the sliding step length is set to 8 to ensure appropriate overlap between samples to increase data diversity. Based on manual patrol marking and expert experience judgment, the original samples are divided into five typical operation status categories in Table 1. In order to deal with the problem of category imbalance, hierarchical sampling strategy is adopted to ensure that the proportion of each category in the training, verification and test sets is consistent. For a small number of samples, the time series data perturbation method is used to enhance, and the inverse proportional weight is introduced into the loss function. The sample set is divided into training set, verification set and test set according to the proportion of 70%: 15%: 15%. The classification of five typical operation status categories is shown in Table 1.
5.2 Model structure and training parameter setting
The fusion model proposed in this paper is based on two-dimensional convolutional neural network (2D-CNN) and bidirectional long-short term memory network (BiLSTM) to realize the depth extraction and fusion discrimination of image and time series features. The network structure is shown in Fig 7. The two channel output feature vectors are weighted and fused through the attention mechanism to highlight the key channels, and feature fusion is performed through the full connection layer output. Finally, the final fault classification is completed using the softmax classifier. The cross entropy loss function is used and added in the training process. The optimizer selects Adam, the initial learning rate is 0.0001, the batch size is 32, and the number of training rounds is 100, so that the model training can avoid oscillation and ensure complete convergence.
The performance of the proposed fusion model is analyzed, as shown in Fig 8. It can be seen from Fig 8 (a) that with the increase of iteration times, the loss value gradually decreases and tends to be stable, and the convergence effect of the model is good. According to the accuracy curve in Fig 8 (b), the accuracy of the training set and the test set increased synchronously, and finally stabilized at a high level without obvious over fitting.
(a) Loss function. (b) Training accuracy and test accuracy.
5.3 Analysis of assessment results
In terms of verifying the effectiveness and superiority of the proposed multi-source feature fusion modeling method, this paper makes a systematic comparative analysis with the current mainstream transformer condition assessment methods [21–24]. The selected comparison models include: traditional machine learning algorithms support vector machine (SVM) and random forest (RF), which are highly representative in simple structure and small sample learning. In the deep learning model, bilstm structure and 2d-cnn structure are selected respectively to verify the independent performance of temporal modeling and spatial modeling ability. In addition, in order to evaluate the impact of attention mechanism on fusion performance, CNN bilstm combined model without attention mechanism is set as the comparison benchmark. Finally, the multi-source depth feature fusion model, which integrates 2d-cnn and bilstm and introduces attention mechanism, is used for comprehensive comparison. By unifying the data set, model configuration and assessment index, the performance of each model in fault identification accuracy, generalization ability, robustness and calculation efficiency is compared, so as to comprehensively evaluate the application advantages of this method in the task of transformer condition monitoring and assessment.
From the model performance comparison chart in Fig 9 and the evaluation results of different models in Table 2, it can be seen that the accuracy, precision, recall, F1 and AUC of this model are significantly better than those of other models, and the accuracy is 3.7% higher than CNN bilstm without attention mechanism, indicating that attention mechanism can effectively improve the fusion effect. It is 4.9% higher than a single bilstm, reflecting the advantages of multi-source feature fusion. In the ROC curve in Fig 10, the ROC curve of the model in this paper is the closest to the upper left corner, and the AUC value is the highest, indicating that the method in this paper has the best comprehensive recognition ability under different thresholds.
The ablation experimental results in Table 3 clearly show the impact of different feature combinations and attention mechanisms on the performance of the model, highlighting the key role of multi-source feature fusion and attention mechanisms. From the perspective of single feature performance, the accuracy rate of the model with only DGA feature input is 88.4%, and the value of F1 is 86.7%, which is better than only current feature and only voiceprint feature, indicating that gas feature has basic recognition value in transformer condition assessment, but the information limitation of single feature leads to the overall low performance. After the fusion of double features, the performance is improved: the accuracy of DGA and current fusion is 91.7%, and value of F1 is 90.1%, which is significantly higher than other double feature combinations, reflecting the complementarity of gas features and electrical features. The fusion effect of DGA and voiceprint, current and voiceprint decreases in turn, indicating that the synergy of different features is different. When the three features were simply stitched and fused (without attention mechanism), the accuracy rate was improved to 92.5% and value of F1 was 91.2%, which further verified the necessity of multi-source information integration. After the introduction of attention mechanism, the accuracy and value of F1 of the complete model reached 96.2% and 95.1% respectively, which was about 3.7% higher than that of the three feature fusion without attention mechanism. It proves that attention mechanism can effectively focus on key features, suppress redundant information, and greatly enhance the fusion effect.
The classification assessment results of each state category in Table 4 show that the proposed multi-source deep fusion state assessment model has excellent and balanced recognition performance for various operation states of transformers. Among them, the recognition effect of normal operation state (C1) is the best, which is closely related to the stability of equipment parameter characteristics under normal state and easy to be captured by the model. The recognition performance of winding overheating (C2) and iron core abnormality (C4) is also outstanding, with F1 score of 95.4% and 94.5% respectively. The former has a low rate of missed detection, while the latter has a strong judgment accuracy, which reflects the model’s effective ability to capture the characteristics of current fluctuations, voiceprint high-frequency anomalies and so on. The F1 scores of insulation deterioration (C3) and cooling system fault (C5) are relatively low, 92.0% and 90.7% respectively, which are related to the fact that the characteristics of insulation deterioration are easily disturbed, the proportion of cooling system fault samples is low and the characteristics are easy to be confused. From the macro average index, precision, recall and F1 score were 94.5%, 93.7% and 94.1% respectively. This further verifies the balance of the proposed model in the assessment of various state recognition, and fully illustrates the significant role of multi-source feature fusion and attention mechanism in improving the comprehensive discrimination ability of the model..
6 Conclusions
This article explores in depth the online monitoring of transformer status and the fusion of multi-source data for the assessment of ultra-high voltage transformer status, and proposes a multi-source deep fusion state assessment model and method. This method combines 2D-CNN and BiLSTM network structures to construct a deep extraction and fusion discrimination mechanism for image and temporal features, and introduces attention mechanism and various comparative experiments to comprehensively verify the assessment accuracy and generalization ability of transformer operation status. Relevant case analysis was conducted on the monitoring data of the 1000kV ultra-high voltage main transformer at Dongwu Substation. The research results indicate that:
- 1) The method proposed in this article can effectively achieve accurate assessment of the state of ultra-high voltage transformers under complex operating conditions. Compared with traditional machine learning algorithms and single deep learning models, this model integrates multi-source features and attention mechanisms, which is more in line with the multidimensional characteristics of transformer state monitoring. Through comprehensive indicators such as macro average F1, it achieves comprehensive assessment from the perspective of classification balance and provides more reliable and detailed state discrimination results.
- 2) By quantifying the impact of different feature combinations on model performance, the key role of multi-source fusion and attention mechanism can be clearly reflected. When three feature fusion and attention mechanism are introduced, the model accuracy reaches 96.2%, which is 3.7% higher than the three feature fusion without attention mechanism. In addition, the performance gap between the AUC value of this model and the comparison model validates its ability to distinguish complex states and capture temporal features, providing a new multi-source fusion perspective for the state assessment of ultra-high voltage transformers and helping to systematically grasp the evolution laws of equipment operation status.
Deploying the multi-source feature extraction and fusion method proposed in the article in the actual transformer monitoring environment requires renovation of existing infrastructure and corresponding investment. Firstly, it is necessary to equip a multi-channel sensor platform to collect key variable data and ensure that the sensors match the operating characteristics of the transformer. In addition, in order to support the computing needs of the deep learning model, high-performance computing resources are also needed, which may involve edge computing or cloud computing platforms. After system deployment, continuous monitoring and maintenance of sensors and data processing modules are required to cope with dynamic changes in transformer operation status and equipment failures. In terms of investment, in addition to hardware infrastructure, it is also necessary to improve data transmission and processing capabilities, which may involve the integration and optimization of existing monitoring systems. In order to ensure the operability and continuous operation of the method, professional training and technical support for operators are also necessary.
In summary, although the method proposed in this article has significant theoretical advantages, its practical application still requires necessary investment in hardware upgrades, data processing platform construction, maintenance, and personnel training to ensure its effectiveness and sustainability in transformer monitoring environments.
References
- 1. Zhou H, Lu L, Wang G, Su Z. A New Validity Detection Method of Online Status Monitoring Data for Power Transformer. IEEE Access. 2024;12:16095–104.
- 2. Zhu Y, Bing K, Liu D, He J, Shi H, Huang X. Research on the online monitoring technique for transformer oil level based on ultrasonic sensors. IET Sci Measure Tech. 2024;18(7):349–60.
- 3. Radionov AA, Liubimov IV, Yachikov IM, Abdulveleev IR, Khramshina EA, Karandaev AS. Method for Forecasting the Remaining Useful Life of a Furnace Transformer Based on Online Monitoring Data. Energies. 2023;16(12):4630.
- 4. Tian J, Song H, Sheng G, Jiang X. Subsystem Measurement-Based Condition Assessment for Power Transformers via Joint Inference of Data and Knowledge. IEEE Trans Instrum Meas. 2024;73:1–12.
- 5. Manoj T, Ranga C, Abu-Siada A, Ghoneim SSM. Analytic Hierarchy Processed Grey Relational Fuzzy Approach for Health Assessment of Power Transformers. IEEE Trans Dielect Electr Insul. 2024;31(3):1480–9.
- 6. Schiewaldt K, de Castro BA, Ardila-Rey JA, Franchin MN, Andreoli AL, Tenbohlen S. Assessment of UHF Frequency Range for Failure Classification in Power Transformers. Sensors (Basel). 2024;24(15):5056. pmid:39124104
- 7. Liu D, Lu B, Wu W, Zhou W, Liu W, Sun Y, et al. Assessment of the Aging State for Transformer Oil-Barrier Insulation by Raman Spectroscopy and Optimized Support Vector Machine. Sensors (Basel). 2024;24(23):7485. pmid:39686022
- 8. Wang L, Liu W, Lyu D, Zhang P, Guo F, Hu Y, et al. Interactive transformer and CNN network for fusion classification of hyperspectral and LiDAR data. Int J Remote Sens. 2024;45(24):9235–66.
- 9. Lu J, Jiang W, Xu Y, Chen Z, Ni K. Measurement of aero-engine feature-hierarchy fusion degradation trend based on parameter-adaptive VMD method and improved transformer model. Meas Sci Technol. 2024;35(7):075005.
- 10. Liu Y, Ren Y, Lin Q, Yu W, Pan W, Su A, et al. A digital twin-based assembly model for multi-source variation fusion on vision transformer. J Manuf Syst. 2024;76:478–501.
- 11. Shang H, Zhao Z, Li J, Wang Z. Partial Discharge Fault Diagnosis in Power Transformers Based on SGMD Approximate Entropy and Optimized BILSTM. Entropy (Basel). 2024;26(7):551. pmid:39056913
- 12. Liu Z, Kang X, Ren F. Dual-TBNet: Improving the Robustness of Speech Features via Dual-Transformer-BiLSTM for Speech Emotion Recognition. IEEE/ACM Trans Audio Speech Lang Process. 2023;31:2193–203.
- 13. Wang L, Littler T, Liu X. Dynamic Incipient Fault Forecasting for Power Transformers Using an LSTM Model. IEEE Trans Dielect Electr Insul. 2023;30(3):1353–61.
- 14. Guangliang P, Jie L, Minglei L. Multi-channel multi-step spectrum prediction using transformer and stacked Bi-LSTM. China Commun. 2025;22(5):1–13.
- 15. Pentsos V, Tragoudas S, Wibbenmeyer J, Khdeer N. A Hybrid LSTM-Transformer Model for Power Load Forecasting. IEEE Trans Smart Grid. 2025;16(3):2624–34.
- 16. Mitiche I, McGrail T, Boreham P, Nesbitt A, Morison G. Data-Driven Anomaly Detection in High-Voltage Transformer Bushings with LSTM Auto-Encoder. Sensors (Basel). 2021;21(21):7426. pmid:34770731
- 17. Zhao M, Cao G, Huang X, Yang L. Hybrid Transformer-CNN for Real Image Denoising. IEEE Signal Process Lett. 2022;29:1252–6.
- 18. Xiang R, Sishan LI, Julong PAN. A novel IoT intrusion detection model using 2dcnn-bilstm. Radioengineering. 2024;33(2).
- 19. Zhou B, Chen J, Wu Q, Pamučar D, Wang W, Zhou L. Risk priority evaluation of power transformer parts based on hybrid fmea framework under hesitant fuzzy environment. FU Mech Eng. 2022;20(2):399.
- 20. Mustafa E, Ahmad B, Ali MI, Afia RSA, Ullah R. Degradation Assessment of In-Service Transformer Oil Based on Electrical and Chemical Properties. Appl Sci. 2024;14(24):11767.
- 21. Liao RJ, Bian JP, Yang LJ, Grzybowski S, Wang YY, Li J. Forecasting dissolved gases content in power transformer oil based on weakening buffer operator and least square support vector machine–Markov. IET Gener Transm Distrib. 2012;6(2):142–51.
- 22. Takagi M, Yamamoto H, Yamaji K. An assessment of amorphous transformers using the load curve pattern model for a pole transformer. Electric Eng Japan. 2009;169(3):1–9.
- 23. Abdali A, Abedi A, Mazlumi K, Rabiee A, Guerrero JM. Novel Hotspot Temperature Prediction of Oil-Immersed Distribution Transformers: An Experimental Case Study. IEEE Trans Ind Electron. 2022;70(7):7310–22.
- 24. Thinh TNH, Lam PD, Tran HQ, Tien LHC, Thai PH. Transformer vibration and noise monitoring system using internet of things. IET Commun. 2023;17(7):815–28.