Transformer Incipient Fault Prediction Using Combined Artificial Neural Network and Various Particle Swarm Optimisation Techniques

It is important to predict the incipient fault in transformer oil accurately so that the maintenance of transformer oil can be performed correctly, reducing the cost of maintenance and minimise the error. Dissolved gas analysis (DGA) has been widely used to predict the incipient fault in power transformers. However, sometimes the existing DGA methods yield inaccurate prediction of the incipient fault in transformer oil because each method is only suitable for certain conditions. Many previous works have reported on the use of intelligence methods to predict the transformer faults. However, it is believed that the accuracy of the previously proposed methods can still be improved. Since artificial neural network (ANN) and particle swarm optimisation (PSO) techniques have never been used in the previously reported work, this work proposes a combination of ANN and various PSO techniques to predict the transformer incipient fault. The advantages of PSO are simplicity and easy implementation. The effectiveness of various PSO techniques in combination with ANN is validated by comparison with the results from the actual fault diagnosis, an existing diagnosis method and ANN alone. Comparison of the results from the proposed methods with the previously reported work was also performed to show the improvement of the proposed methods. It was found that the proposed ANN-Evolutionary PSO method yields the highest percentage of correct identification for transformer fault type than the existing diagnosis method and previously reported works.


Introduction
Power transformer is one of the important equipment in power systems since the transformer is vital to step-up or step-down the voltage and isolation of the electrical power. Thus, transformer breakdown may interrupt the power systems. The transformer fault leads to the electrical and thermal stresses, which will eventually cause the breakdown of insulating materials and the release of gaseous decomposition products respectively. Corona, sparking, arcing and overheating are subject to fault related gases released including hydrogen (H 2 ), methane (CH 4 ), acetylene (C 2 H 2 ), ethylene (C 2 H 4 ), ethane (C 2 H 6 ) and carbon monoxide (CO) [1,2].
Thus, transformer maintenance is very important and proper monitoring on the transformer condition will help to avoid the breakdown of transformer. Transformer oil condition monitoring is one of the fundamental methods in maintaining power transformers. The oil test types can be categorised into physical, chemical, electrical and environment type. Among all the tests, the chemical test, dissolved gas analysis (DGA) is commonly used which diagnose the faults based on certain ratio of dissolved gas in oil sample [3]. The existing methods of DGA are key gas method, Doenernberg's ratio method, Roger's ratio method and IEC method.
DGA methods involve processing data of the transformer oil sample and fault recognition through annalist experience and ability. However, the main problems with the existing DGA methods are it relies heavily on the experts and the actual site testing has shown that different DGA methods lead to different fault type. Hence, research on reliable techniques to diagnose the transformer fault is actively ongoing.
Since the past, there have been many works conducted on the applications of artificial intelligence and optimisation in condition monitoring on power system components and fault diagnosis, including transformer incipient fault diagnosis [4][5][6][7][8][9][10]. These include artificial neural network (ANN), fuzzy logic, rough set theory, support vector machine (SVM) and genetic programming.
One of the most widely used artificial intelligence methods in transformer fault prediction is artificial neural network (ANN) [1,11,12]. ANN is widely used due to it can learn from the training data directly and the complexity of computation in ANN is less. It is also adaptive, able to handle various nonlinear relationships and can generalize solutions for a new data set [13]. ANN directly implements the association process of inputs, where for transformer incipient fault prediction, it is the gas concentration and the outputs or fault type. Hence, physical model and a predefined correspondence function are not required. However, the convergence is slow and sometimes oscillation occurs. Also, the parameters of the ANN, such as the number of neuron and hidden layer, must be properly chosen in order to obtain the best performance of the network.
Many researches have been performed on the use of ANN in DGA methods to facilitate the detection of transformer incipient fault [11,14]. The input and output from the DGA results of transformer oil were used to train a neural network and identify the fault type from the trained network. Although the use of ANN in transformer incipient fault detection seems to be reasonable, the chosen ANN parameters might not yield the best accuracy of the network output.
In one of the previous works, ANN was combined with the knowledge based of expert system for transformer fault diagnosis from DGA analysis [4]. The combination of both methods yields better performance than each method being used individually. This is due to the combination of the ANN and expert system takes advantage of superior features of each method and allows them to dominate different fault diagnosis. The usage of fuzzy logic has shown that the fault type of transformer can be obtained efficiently [15]. Fuzzy logic was applied as practical representation of the relationship between the gas content levels and fault type and with fuzzy membership functions. A combination of three fuzzy methods shows that the accuracy of the method is higher than a single fuzzy system in identifying the transformer fault type.
Combination of Artificial Immune System (AIS) and ANN was proposed in [7] to assess the transformer fault type based on DGA analysis. The AIS was used to determine the centers of the Radial Basis Function Neural Network (RBFNN). It was shown that the combination of AIS and RBFNN yields better transformer diagnosis accuracy than random selection and k-means clustering in determining the RBFNN hidden centers. Other neural network application has also been employed to improve the diagnostic accuracy of power transformer fault classification based on DGA analysis. Bootstrap and genetic programming (GP) feature extraction were combined with ANN and KNN classifiers [5]. The bootstrap eliminated the less fault type samples in the DGA data. Then, the features of the DGA data extracted with GP were used as the inputs in ANN and KNN classifiers. It was reported that bootstrap and GP combined with KNN yields a higher accuracy of transformer fault classification.
Genetic wavelets network (GWN) was proposed to enhance the existing DGA methods for transformer incipient fault identification [8]. The method combined genetic algorithm (GA), wavelet and ANN. GA was used to determine the optimal parameters of GWNs to achieve the best diagnostic DGA model. The wavelet transform property and the decomposed data feature extracted important information from the input data. It was reported that the proposed GWN method yields the best classification for the transformer fault identification compared to without wavelet transform.
Self-organizing polynomial networks (SOPN) was proposed as an intelligent decision making for the transformer fault diagnosis [9]. In this technique, the problem is heuristically formulated into a hierarchical architecture with several layers of simple low-order polynomial functional nodes. The networks handled the complicated and uncertain relationships of DGA data from transformer oil samples. The work reported that the proposed method yields far superior performance than the conventional DGA and ANN classification methods.
Although many previous works have reported on the use of intelligence methods to predict the transformer oil faults, it is believed that the accuracy of the previously proposed methods can still be improved. Since artificial neural network (ANN) and particle swarm optimisation (PSO) techniques have never been reported in the previous literature, a combination of ANN and various PSO techniques to predict the transformer incipient fault are proposed in this work. The advantages of PSO are simplicity and easy implementation. In this work, the possibility of using various particle swarm optimisation (PSO) techniques with ANN in identifying the transformer incipient fault is explored.
PSO is an evolutionary algorithm that is widely implemented in optimisation problems [16][17][18][19]. The thought process behind the algorithm was inspired by the social behaviour of animals, such as bird flocking or fish schooling [20][21][22][23]. PSO is a population based search algorithm characterized as conceptually simple, easy to implement, computationally efficient, rapid convergence and has the ability to avoid the local minima in a successful way. Hence, these characteristics are advantageous to complex optimisation problems which use huge number of parameters and have difficulty in obtaining the analytical solutions.
In this work, the PSO optimisation methods used are conventional PSO method, iteration PSO (IPSO) and evolutionary PSO (EPSO) method. The percentage of correct prediction of transformer incipient fault from the proposed methods was compared with each other and also with the methods which use only ANN technique and the existing DGA technique. The results from the proposed methods were also compared with the previously reported work to show the improvement of the proposed methods. Hence, the best type of PSO method combined with ANN could be identified, which may improve the transformer incipient fault diagnosis.
This paper is presented as follows. In section 2, the implementation of ANN is described. Section 3 explains various methods of PSO used in this work. They include PSO, iteration PSO (IPSO) and evolutionary PSO (EPSO). Section 4 discusses all results obtained from each method. The section includes results from the application of various PSO techniques and ANN, ANN alone and an existing DGA method in predicting the transformer incipient fault. Finally, section 5 summarises all findings obtained from this work.

Artificial Neural Network (ANN)
ANN is a computational model to imitate the biological neural networks, which consists of interconnected neurons in order to compute the output from the input. ANN is widely used due to its ability to interpolate and extrapolate from the experience of analysing the data and able to reveal highly nonlinear input-output relationship [24]. Since the condition of electrical systems changes, ANN can adapt itself to new state and put the new state into new training [25]. Hence, ANN is a good approach to compute the relationship, which is difficult to describe explicitly. In this work, the neural network was developed in MATLAB programming language.

Input and output data
In designing ANN, the selection of input, output and network topology is subject to performance of the ANN model [26]. The data of the gas compositions with respect to the incipient transformer fault were obtained from the actual data of an electrical utility. The design of ANN can be divided into two stages; the training and testing. The flowchart of the development of ANN is shown in Fig 1. Table 1 shows some of the actual data of incipient transformer fault from an electrical utility that were used in this work.

Training Stage
Firstly, the input data and target data are imported into the network. The gas composition was set as the input and the transformer incipient fault was set as the target. In this work, 100 input data consist of 6 types of gases were used while the fault, which was used as output can be classified into no fault, thermal fault, low intensity and high intensity. These data were categorised into training, validation and testing sets. The training set was 70% of total 100 data and 15% each for validation and test data.
In the training stage, a backpropagation algorithm is a generalised delta rule for feed-forward network with multiple of layers. This is due to its possibility to compute the gradient of each layer iteratively by using chain rule [27]. Generally, a sigmoid activation function is used because of its nonlinearity and compatibility with feed-forward backpropagation-learning algorithm to perform better. In this work, the Lavenberg-Marquart (LM) is used as the training function since it is fast, simple and robust algorithm. Thus, feed-forward backpropagationlearning algorithm was set as the network type for ANN architecture of this work.
By tuning the number of hidden layers, number of neurons and transfer function, the best parameters for ANN were selected with the highest accuracy, which equals to, R. A three-layer network which consists of two hidden layers and one output layer was used in this work. Although one hidden layer is enough for nonlinear mapping, a network with two hidden layer is the optimal in iteration number, accuracy and complexity compared to the network with one and three hidden layers. Moreover, three-layer network can overcome the problem of slow rate of training.
In developing the best ANN, the learning rate (LR) and momentum cost (MC) were varied from 0 to 0.9 to obtain the optimised value of LR and MC [28]. Since all parameter were varied heuristically, the problem of underfitting and overfitting network could occur. Overfitting occurs when the network is capable to memorise the network but cannot generalise the new data for network. To overcome the overfitting problem, early stopping technique was applied to develop better performance. The stop criterion is determined by comparing the mean square error of the training data while training with the data with a certain limit.  Testing Stage To test the trained network, a new set of data was simulated. The output of the new data set was simulated using the trained ANN. The best trained network shows that the simulated output agrees well with the target output. A regression coefficient, R is used to determine the performance of the trained network.

Particle Swarm Optimisation (PSO)
PSO is a computational optimised technique which was invented by Kennedy and Eberthart in 1995 with the concept of bird flocking and fish schooling behaviour [29]. An optimisation problem can be formulated as a flock of birds fly across an area seeking for spot with abundant food. To find the optimised value, ANN is a time-consuming and difficult on computational process due to its heuristic characteristic. Hence, PSO is a better approach to find the optimised LR and MC values in ANN. In this work, MATLAB programming language was used to execute various PSO algorithms. In PSO, the population-based search is used to achieve the optimised objective function. Firstly, the potential solution known as particles is initialised randomly and explores in a dimension, d search region. With the strategy of each particle updates its velocity and position, the particle swarm will move nearer to the region with higher object value. The flowchart of PSO technique is shown in Fig 2. The steps in PSO are explained as follows [30]: Step 1: Initialisation The swarm is initialised by setting the position and the velocity of particles randomly.
Step 2: Evaluate fitness function The fitness value for each particle with updated position and velocity is calculated.
Step 3: Update pbest j id and gbest j id The fitness value of each particle is compared with personal best, pbest j id . If the new fitness value is better than the pbest j id , this value will be set as pbest j id and the current position of particle, X j id . Among the entire particle, the best fitness value will be set as gbest j id or global best value.
Step 4: Update velocity and position The velocity and position of all particles are updated using V id jþ1 = updated velocity of particle i in dimension d search region V id j = velocity of particle i at iteration j X id jþ1 = updated position of particle i in dimension d search region X id j = position of particle i at iteration j c 1 , c 2 = acceleration factors r 1 , r 2 = random constant between 0 and 1 w = inertia weight ¼ w max À w max À w min iteration max ðiterationÞ w max = maximum weight w min = minimum weight Step 5: Meet the end condition Steps 2-4 are repeated until the stopping criterion is fulfilled such that the best fitness value is achieved or the number of iteration has been reached to its maximum. In this work, c 1 and c 2 were both set as 0.7, w max and w min were set as 0.9 and 0.4 the iteration was set as 100. These are the common values used in most of the PSO algorithms.

Iteration particle swarm optimisation (IPSO)
PSO is easy to trap in the local minimum. Thus, improved PSO algorithm has been suggested for better performance, called as iteration PSO (IPSO) [31]. In IPSO, the best iteration, I j best,d is employed to improve the basic PSO performance in term of accuracy. The equation for the velocity in Eq (1) is modified as follows [32]: where I best;d j = best fitness value which obtained by any particle in iteration j c 3 = weight of stochastic acceleration term to attract each particle toward I best;d j c 3 ¼ c 1 ½1 À expðÀc 1 Â ðiterationÞÞ ð4Þ

Evolutionary Particle Swarm Optimisation (EPSO)
EPSO is an optimisation technique which combines the concept of evolutionary strategies and PSO. The main advantage of EPSO is that the search of particle would not be focused on the region of global best fitness value, but the optimum may be in the neighbourhood if the optimal value has not been found. The concept of duplication, mutation and reproduction is employed in EPSO. For duplication, each particle is duplicated. Next, each particle with mutated weight w Ã will reproduce an offspring by abiding the particle movement rule. The equation for the velocity in Eq (1) is modified as follows [33]: where gbest id Ã ¼ gbest id j þ t0Nð0; 1Þ is the mutated global best position w ik Ã ¼ w ik þ tNð0; 1Þ is the mutated weight τ and τ' = learning parameter N(0,1) = a random variable with Gaussian distribution with 0 mean and variance of 1 In this work, τ and τ' were varied from 0.1 to 0.9 until the best output from the ANN was achieved. It was found that the most suitable value for both τ and τ' is 0.3.

Diagnosis Results
Fault prediction using IEC 60599 method 100 data were used in this work and classified into electrical fault, thermal fault and no fault with 32 cases, 16 cases and 50 cases respectively. IEC 60599 method was used to predict the fault by using the gas ratio. The results are tabulated in Table 2. Comparing the indicated fault with the actual fault occurred, IEC 60599 achieve 70% correct prediction of the transformer fault.

Fault prediction using ANN alone
The simulations were performed by using different number of hidden layer, number of neurons, learning rate (LR) and momentum constant (MC) in the ANN. The process of finding the best ANN model is as follows: 1. Variation of number of neurons in hidden layer (HL). Before determining the parameters of LR and MC, typical values of LR and MC were used as 0.05 and 0.95 respectively. The training function and learning function used were Lavenberg-Marquart (TRAINLM) and Gradient Descent with momentum weight and bias learning function (LEARNGDM). The number of neurons in HL1 was increased from 2 to 20 with a step of 2 by keeping other parameters constant. The transfer functions of HL1 and HL 2 were logsig-logsig while the transfer function for output layer was pure-linear (PURELIN).
The next stage was increasing the number of neurons in HL2 from 2 to 20 with a step of 2 by keeping other parameters constant. 100 set of data with different number of neurons in HL1 and HL2 were tested. The ANN model with the highest R value was selected as shown in Table 3. The number of neurons for HL1 and HL2 was selected as 4 and 10 respectively since higher number of neurons than these values do not yield any further improvement in the network performance.

Variation of LR and MC values.
By keeping MC constant, the value of LR was increased from 0 to 0.9 with a step of 0.01. For each LR, the value of MC was increased from 0 to 0.9 with a step of 0.1. The outcome from this neural network is shown in Table 4.

Fault prediction using ANN with Particle Swarm Optimisation (PSO)
In order to find the optimised value of LR and MC in ANN, PSO was employed. Since there are two parameters need to be optimised, LR and MC were defined as particle1 and particle2 respectively in the PSO. Using PSO, the optimised result for LR, MC and the best position were obtained. The simulation process of PSO was run for 20 times to test the robustness of the method. Next, the ANN was trained with the data obtained from the PSO. The results from the ANN-PSO method are shown in Table 5.

Fault prediction using ANN combined with iteration PSO (IPSO)
The ANN model was also combined with IPSO. The IPSO technique was used to obtain the optimised values of LR and MC. Similar process as ANN-PSO method was repeated for ANN-IPSO method. The optimised value of LR and MC, R value and the percentage of correct fault prediction obtained from this method are shown in Table 6.  Table 7.

Comparison between ANN, ANN-PSO, ANN-IPSO and ANN-EPSO methods
In order to identify the best technique to predict the transformer incipient fault, the ANN alone, ANN combined with PSO, IPSO and EPSO techniques were compared in terms of R value, percentage of correct fault prediction and convergence rate. From Table 8, it can be seen clearly that the ANN-EPSO technique yields the highest R value and percentage of correct fault prediction, followed by the ANN-IPSO technique, ANN-PSO technique and finally the ANN only. All of these methods yield higher correct prediction of transformer incipient fault than the existing DGA method, which is IEC method. R value for the ANN alone is the least because ANN is heuristic in nature. In EPSO technique, the value of LR and MC obtained underwent duplication, mutation and reproduction. Thus, ANN with EPSO yields the best results in identifying the transformer incipient fault compared to ANN, ANN-PSO and ANN-IPSO techniques.
The proposed methods were also compared with the existing methods that have been reported in literature. Table 9 summarises the comparison results. From this table, it can be seen that the proposed ANN-EPSO yields the highest percentage of correct transformer identification. Although EPSO is simple and easy to be implemented, it yields the best result when combined with ANN. In EPSO, the search of particle would not be focused on the region of global best fitness value, but the optimum value may be in the neighbourhood if the optimal value is not found. A graph of the best position against iteration for ANN with each PSO technique is shown in  Table 8.

Conclusions
In transformer incipient fault recognition, the relationship between gas type and fault is nonlinear. This causes problem in the convergence rate and oscillation in artificial neural network (ANN). The parameters of the ANN must also be properly tuned in order to obtain the best performance of the network. To overcome these problems, in this work, a method of combination of artificial neural network (ANN) and various particle swarm optimisation (PSO) techniques to predict transformer incipient fault has been successfully proposed. In this method, the ANN was used to identify the transformer incipient fault and various techniques of PSO were applied to optimise the performance of the ANN. The performance of various PSO techniques in combination with ANN was compared with the existing DGA method, ANN alone and previously reported work to identify the best method for transformer incipient fault prediction. It was found that the method of combination of ANN with evolutionary PSO (EPSO) yields the best performance in the transformer fault prediction compared to the existing DGA method and previously reported works. Hence, this method can be proposed as one of the solutions in the field diagnosis of transformer incipient fault.