Rule Extraction Based on Extreme Learning Machine and an Improved Ant-Miner Algorithm for Transient Stability Assessment

In order to overcome the problems of poor understandability of the pattern recognition-based transient stability assessment (PRTSA) methods, a new rule extraction method based on extreme learning machine (ELM) and an improved Ant-miner (IAM) algorithm is presented in this paper. First, the basic principles of ELM and Ant-miner algorithm are respectively introduced. Then, based on the selected optimal feature subset, an example sample set is generated by the trained ELM-based PRTSA model. And finally, a set of classification rules are obtained by IAM algorithm to replace the original ELM network. The novelty of this proposal is that transient stability rules are extracted from an example sample set generated by the trained ELM-based transient stability assessment model by using IAM algorithm. The effectiveness of the proposed method is shown by the application results on the New England 39-bus power system and a practical power system — the southern power system of Hebei province.


Introduction
Transient stability is concerned with the stability of the power system to maintain synchronism when subjected to a severe disturbance, such as a short circuit on a transmission line [1]. Transient stability assessment (TSA) has been recognized as an important issue to ensure the secure operation of power systems. With interconnection of large-scale power grids, electricity market reform and growing presence of large-scale intermittent renewable energy, the dynamic behaviors of power systems are becoming more complex and difficult to be controlled, with more serious consequences resulted from transient instability [2]. The available TSA methods, such as time domain simulation methods [3], energy function based methods [4] and the extended equal-area criterion [5], can not meet the demands of online applications required by the modern power systems. With the rapid development of computational intelligence, the pattern recognition-based TSA (PRTSA) methods have shown much potential for on-line application to power systems [6][7][8][9]. Its main task is to establish a mapping relationship between system state variables and system stability conclusions. Compared to the other TSA methods, the PRTSA methods have a lot of advantages, such as strong learning ability, fast assessment speed and

Principle of ELM
For N arbitrary distinct samples (x i , y i ) 2 R n × R m , where x i = [x i1 , x i2 ,Á Á Á, x in ] T 2 R n is the feature vector and y i = [y i1 , y i2 ,Á Á Á, y im ] T 2 R m is the target vector, standard SLFNs with L hidden nodes can be mathematically modeled as where w i = [w i1 , w i2 ,Á Á Á, w in ] T is the weight vector connecting the ith hidden node and the input nodes, β i = [β i1 , β i2 ,Á Á Á, β im ] T is the weight vector connecting the ith hidden node and the output nodes, and d i is the threshold of the ith hidden node, G(Á) is the activation function. Eq (1) can be written compactly as where H is the hidden layer output matrix of the neural network, ELM is to minimize the training error as well as the norm of the output weights [13,14] Minimize : kHβ À Yk 2 and kβk ð 3Þ The minimal norm least square solution of (2) is as follows.
β¼ H y Y: where H † is the Moore-Penrose generalized inverse of matrix H. Given a training set, the activation function and the hidden nodes, learning process of ELM is as follows: (a) randomly generated parameters of the hidden layer nodes (w i , d i ), i = 1,Á Á Á, L; (b) calculate the hidden layer output matrix H; (c) calculate the output weights β.

Introduction of Ant-miner algorithm
Ant-miner algorithm is to simulate the process of extraction rules into the process of ants foraging, and the optimal path is chosen as the optimal classification rules [17]. The specific steps are described as follows.
Step 1: Initialization of pheromones. The initial path of the pheromone τ ij is set as follows.
where t and a are respectively the number of iterations and attributes, and b i is the number of values in the domain of the ith attribute. Pheromone level is updated in two phases: evaporation and reinforcement. Evaporation is accomplished by a pheromone evaporation rate ρ, and reinforcement of the pheromone trail is only applied to the best ant's path.
Step 2: Selection of attributes. Let term ij be a rule condition of the form A i = V ij , where A i is the ith attribute value and V ij is the jth value of the domain of A i . The probability that term ij is selected to be added to the current partial rule is determined by the decision P ij : where τ ij (t) is the pheromone of term ij in the tth iteration, the heuristic function η ij is represented by (7), (8).
where k is the number of categories, InfoT ij is the information entropy of term ij , |T ij | is the samples number of the partition T ij , freqT e ij is the samples number of class e in T ij .
Step 3: Rule pruning. Aiming at the problem of over-fitting, the obtained rules are needed to be pruned. The effectiveness of the rules Q is determined by (9).
where, TP (true positives) is the number of cases covered by the rule that have the class predicted by the rule, FP (false positives) is the number of cases covered by the rule that have a class different from the class predicted by the rule, FN is (false negatives) the number of cases that are not covered by the rule but that have the class predicted by the rule, and TN (true negatives) is the number of cases that are not covered by the rule and that do not have the class predicted by the rule.
where ρ is the pheromone evaporation rate.
Step 5: Choose the best rule R best , and adding it to the rule sets.
Step 6: Delete the training samples covered by the existing rules.
Step 7: Repeat step 1~6, until the number of training samples is not bigger than the maximum number of the preset uncovered samples.

Steps of rule extraction
From a functional point of view, a rule extraction method based on ELM and IAM is proposed for TSA as shown in Fig 1. The proposed method focuses on the mapping relationship between the state information and the stability result of power systems, and emphasizes the ability to reproduce the function of the ELM classifier, which does not consider the type and structure of ELM. The basic idea of the proposed method is to regard the trained ELM as a new sample space, and then use IAM algorithm to extract the hidden knowledge into rules with good understandability. In this way, the classification rules are generated to functionally replace the original ELM network.
The proposed rule extraction approach can be divided into three steps, comprising feature selection, acquisition of the example sample set and rule learning.

Feature selection
As is well known, feature selection is an issue of paramount importance for PRTSA. In [23], a feature selection method based on kernelized fuzzy rough sets (KFRS) and the memetic algorithm is proposed for TSA. By defining a KFRS-based generalized classification function as the separability criterion, the memetic algorithm based on binary differential evolution and Tabu search is employed to obtain the optimal feature subsets with the maximized classification capability (see [23] for further details).
The approach presented here uses the same feature selection method in [23], and the optimal feature subsets obtained for the New England 39-bus system and the southern power system of Hebei province are used as the input features in this study, as respectively shown in Tables 1 and 2. Here, t 0 and t cl denote the fault occurrence and clearing time in turn, t cl+3c, t cl+6c and t cl+9c respectively denotes the 3-rd, 6-th and 9-th cycle after the fault clearance.

Acquisition of the example sample set
An example consists of the input mode and output mode. If an example is judged by using the trained ELM model, the assessment result is used as the output mode. Then, a new example can be obtained by composing the obtained output mode and the original input mode, which reflects the response characteristic of ELM in a certain extent [24]. If the examples are sufficient enough and cover the entire sample space, the rules obtained will have the similar functions with the original ELM, i.e., these rules can describe the functions of the original network. The steps of generating the example sample set are listed as follows.
Step 1: Based on the training set A with class labels, the TSA model is obtained by using ELM.
Step 2: Determine the range of the input features, and then generate a random data set B 1 (the input modes) without the corresponding class labels in the range. In addition, one should to note that the data set B 1 has the same input features with the training set A, and their values are different.
Step 3: The obtained data set B 1 is assessed by using the trained ELM-based TSA model, and then the corresponding class labels (the output modes) are obtained.
Step 4: Finally, the example sample set B used for the follow-up rule learning can be acquired by combining with the input and output modes, which are respectively obtained in Step 2 and Step 3.
Rule learning IAM algorithm. "No free lunch" theorem [25] shows that there is no optimization algorithm is optimal for any problems such as global optimization ability and convergence speed. The original Ant-Miner algorithm is improved from two aspects in this paper. On the one Table 2. The input features for the southern power system of Hebei province.

No.
Input features

Tz1
Mean value of all the initial acceleration power

Tz2
Maximum value of all the rotor kinetic energies at t cl

Tz3
Rotor angle of the machine with the biggest difference relative to the centre of inertia at t cl+3c

Tz4
Maximum value of the difference of rotor angles at t cl+3c

Tz5
Kinetic energy of the machine with the maximum rotor angle at t cl+3c

Tz6
Rotor angle of the machine with the biggest difference relative to the centre of inertia at t cl+6c

Tz7
Maximum value of the difference of rotor angles at t cl+6c

Tz8
Rotor angular velocity of the machine with the biggest difference relative to the centre of inertia at t cl+6c

Tz9
Rotor angle of the machine with the biggest difference relative to the centre of inertia at t cl+9c Tz10 Maximum value of the difference of rotor angles at t cl+9c Tz11 Rotor angular velocity of the machine with the biggest difference relative to the centre of inertia at t cl+9c doi:10.1371/journal.pone.0130814.t002 Tz3 Rotor angular velocity of the machine with the biggest difference relative to the centre of inertia at t cl+3c Tz4 Rotor angle of the machine with the biggest difference relative to the centre of inertia at t cl+6c Tz5 Rotor angular velocity of the machine with the biggest difference relative to the centre of inertia at t cl+6c Tz6 Rotor angle of the machine with the biggest difference relative to the centre of inertia at t cl+9c Tz7 Rotor angular velocity of the machine with the biggest difference relative to the centre of inertia at t cl+9c hand, an adaptive tuning strategy is adopted to tune the pheromone evaporation rate ρ; on the other hand, the heuristic function is improved to reduce the computational overhead.
(1) Improvement of pheromone evaporation rate ρ. In Ant-miner algorithm, the control parameter ρ plays an important role in the performance of Ant-miner algorithm, so an adaptive tuning strategy is adopted here. If ρ is bigger, the algorithm is not easy to fall into local optimum, but its convergence speed is slow; otherwise, its convergence speed is fast, but it is easy to fall into local optimum. In this paper, a self-adaptive adjustment control way is employed to improve the performance of the algorithm, ( where ρ min is the minimum of ρ. By setting the dynamic parameter ρ, the useful information in the last search can be preserved, which facilitates a more finer search in a better area. By this means, not only the convergent speed but also the searching ability are enhanced. (2) Improvement of the heuristic function. The heuristic value in AntMiner is defined as an information theoretic measure in terms of the entropy. Unfortunately, the information entropy-based heuristic function is complex and time-consuming. With the assumption that the small induced errors are compensated by the pheromone level [26], a density-based heuristic function is employed as shown in (12), which makes IAM computationally less expensive without a significant degradation of the stated performance.
where |T ij | and |Ts| are the first iteration t, respectively, the total number of samples in the sample number and the division T ij of the training set.
Process of IAM algorithm. An iteration of the IAM Algorithm mainly consists of three steps, comprising rule construction, rule pruning, and pheromone updating, detailed as shown in Fig 2. Rule learning based on IAM. The rule learning steps are listed as follows: Step 1: Data pre-processing. The z-score standardization method is used as the data preprocessing method for the obtained sample set.
where F and σ F are the mean and standard deviation of any feature F in sample data, respectively; f' is the normalized value of f, f 2 F.
Step 2: Based on the trained ELM-based TSA model, an example sample set S can be obtained. To approximate the functions of the original ELM network, the obtained samples should be sufficient enough and cover the entire sample space with the uniform distribution.
Step 3: If there exist some examples with a same attributes combination belong to the same class in S, then a new rule R u is established, i.e. the attributes combination and the class are respectively used for the rule antecedent and rule consequent.
Step 4: Based on the trained IAM-based TSA model, a new example sample set S Ã is generated to justify the fidelity of the rule set. If the fidelity of the rule set meets the requirements, then the rule R u is retained in the discovered rule list, and the covered cases by the rule R u is removed from S; otherwise, R u is eliminated from the rule list.
Step 5: If S = ;, then stop the learning process; otherwise, go to Step 3 and repeat the process. This makes the generated rules gradually approaching the functions of the original ELM classifier.
Step 6: Output the generated rule set.

Case Studies
The effectiveness of the proposed method is tested on the New England 39-bus power system and a practical power system-the southern power system of Hebei province. All programs are implemented in MATLAB on a PC platform with the master frequency 1.81 GHz and main memory 1 GB. In order to properly assess the performance of the proposed predictive model, the wellknown model validation technique, cross-validation, is employed in the following case studies, as it provides a nearly unbiased estimate of the generalization ability of predictive models by avoiding overfitting and underfitting.

Case 1-The New England 39-bus power system
The New England 39-bus power system is the widely-used test system for TSA studies reported in the literature [10][11][12]22]. The one-line diagram of the power system is shown in Fig 3. Generation of the sample sets. Extensive time domain simulation work has been carried out to create the training and test sample sets. The simulation is done with the four-order machine model and the IEEE DC1 excitation system model, as well as the constant impedance load model. A three-phase short-circuit faults is created at instant 0.1 s and cleared at 0.2 s. A successful reclosure of the faulted line is applied after fault clearance with no topology change from the fault. A total of 4800 arbitrary samples at 80 different fault locations are created under 75%, 80%, 85%, . . .. . ., 130% of the basic load levels. Corresponding to each loading level, 5 different generator outputs are randomly set. A class label "-1" or "+1" is assigned to each sample according to maximum relative rotor angle deviation during the transient period. If the maximum relative rotor angle deviation exceeded 360 degree [22,23], the class label is given as "-1" and the system is considered to be transiently unstable; otherwise, the label is given as "+1" and the system is stable. In Figs 4 and 5, a transient stable case and an unstable case are respectively shown by plotting the rotor angle swing curves.
Parameter settings. In the proposed approach, there are only two parameters, the evaporation rate ρ and the number of ants, needed to be set. To obtain the optimal parameters, large amounts of experiments over a range of parameter settings have been carried out with the results shown in Figs 6-8. Fig 6 shows the influence of these parameters on accuracy with a wider line used for the experiments with 400 ants and varying ρ, and the experiments with ρ set to 0.85 and varying number of ants. Fig 7 and Fig 8 show the surface lines for a constant evaporation rate (0.85) and constant number of ants (400), respectively.
It might be taken for granted that the better results will be obtained if the parameters are higher, since more ants mean that more candidate rules are generated and increasing ρ will cause the convergence process to be slower. However, from a certain threshold on, an Rule Extraction Based on ELM and IAM for TSA interesting phenomenon called "a flat-maximum effect" can been found: with the increasement of the parameters, accuracy increases at first; but it has no significant increase from a certain threshold on, as shown in Figs 6-8.
Based on the experiment results, the evaporation rate ρ and the number of ants are respectively set to 0.85 and 400 (indicated with the white dot in Fig 6), since the choice provides the best performance for the proposal in the most cases. Therefore, the choice is employed as the parameter settings for our experiments.
Evaluation measures. In order to properly assess the performance of the proposed method (ELM-rules), it is tested by using the well-known 5-fold cross-validation methods. Taking into account that the predictive accuracy Acc has some kind of occasionality, the test results should be assessed in statistical basis [23]. Therefore, measures like precision Prec and the area under the receiver operating characteristic (ROC) curve AUC are also taken into account for the performance assessment of the proposed method. If a classifier model is perfect, AUC will be 1. If the model is just a random guess model, AUC will be 0.5. The value of model of AUC is greater, the model is more excellent. Considering the above three classification performance indicators, a composite indicator η is used to comprehensively evaluate the TSA Rule Extraction Based on ELM and IAM for TSA classifier models' performance [23]. η is defined as

Results and Discussion
(1) Test results. The presented method is tested by using cross-validation methods, and comparative tests are also carried out by using other relative state-of-the-art methods, including RuleFit (a rule-based ensembles method) [27], Rotation forest [28], ELM [16] and MLP. The test results are reported in Table 3 with the ROC curve shown in Fig 9. In Table 3, Acc is the average validation accuracy of the 5-fold cross validation, #R is the number of the discovered rules, #T/R is the average number of rule conditions (terms) per rule, and the numbers right after the "±" symbol are the standard deviations of the corresponding evaluation measures. The parameters of the predictive models are set as follows: the parameters of RuleFit and Rotation forest are respectively set according to the reference [27] and [28]; for the ELM classifier, the hidden layer node number is set to 50; the MLP is designed with the hidden neuron number 15, and the back-propagation algorithm with the learning rate and momentum factor 0.8 and 0.7 respectively is employed.
As is shown in Table 3, the assessment results of different predictive models are different from each other, which can be summarized according to classification ability and rule list simplicity. Concerning classification ability, the classification ability of ELM is highest with the composite indicator η getting the maximum value 0.9697; the one of MLP are lowest with η getting the minimum value 0.9386; and the ones of RuleFit, ELM-rules and Rotation forest are respectively 0.9670, 0.9643 and 0.9643. Concerning rule list simplicity, ELM and MLP are the black-box incomprehensible predictive models; among rule-based methods, the presented method is superior to the others, and Rotation forest is the worst one.
From Table 3, a good tradeoff between classification ability and rule list simplicity is clearly present: 1) the most accurate results are obtained by incomprehensible nonlinear ELM models; 2) the most accurate rule-based classifier, RuleFit, is slightly better than the second best rule-based classifier, ELM-rules, however, ELM-rules discovered rule lists simpler than that discovered by RuleFit since which has fewer the average number of rules and terms per rule; 3) ELM-rules and Rotation forest are roughly equivalent in terms of predictive accuracy (η of them are equivalent to each other), and ELM-rules discovered rule lists much simpler than that discovered by Rotation forest. The reason for this is that the proposed approach extracts transient stability rules from an example sample set generated by the trained ELM-based TSA model by using IAM algorithm. By this means, the proposal combines the advantages of ELM and IAM, resulting in the good predictive performance of the obtained rules. Therefore, the conclusion can be drawn on the basis of the evidence that the proposed method is an effective way to extract the transient stability rules, since the simplicity of a rule list tends to be even more important than its predictive ability in TSA, as motivated earlier in this paper. Case 2-The southern power system of Hebei province In order to further justify the efficacy and applicability of the proposed approach for a more complex and practical system, it is examined on the model of a practical large power systemthe southern power system of Hebei province, China. The power system covering an area of 84,000 square kilometers is a highly interconnected grid with an approximate installed capacity of 28260 MW. The modeled system comprises of 83 generators, some series compensated lines and static var compensators (SVCs).

Generation of the sample sets
Extensive simulation has been carried out to generate the sample sets. Of all 83 generators in the system, 11 generators are modeled as the six-order model, and the excitation systems and governors are considered; others the classical machine model. The load model is represented by a comprehensive model with 40% constant-impedance and 60% constant power. In the range from 90% to 120% of the basic load level, active and reactive powers of generators are set correspondingly. Contingencies created are three-phase to ground faults, and the fault clearing times are varied from five to ten cycles. A successful reclosure of the faulted line is applied after fault clearance with no topology change from the fault. The fault locations lie at 0, 25%, 50%, and 75% of the length on transmission lines. The stability criterion is as same as in Case-1. A total of 5000 arbitrary samples are created.

Prediction Results and Performance
In this Case, the performance of the proposed approach is also evaluated by using the 5-fold cross-validation method. By using the parameter setting method in Case-1, the evaporation rate ρ and the number of ants are respectively set to 0.85 and 600 through large amounts of experiments. The ROC curve of the proposal is shown in Fig 10, and the test results of the presented method and the relative state-of-the-art methods in this Case are summarized in Table 4. From Table 4, the presented results prove that the proposed algorithm is applicable to real power systems, and it is able to extract transient stability rules for a real power system-the southern power system of Hebei province. Concerning classification ability, the proposed method has roughly equivalent performance to the relative state-of-the-art works on this area; concerning rule list simplicity, the presented approach is superior to the others. Based on a comprehensive consideration of classification ability and rule list simplicity, the presented method is a good choice to solve the rule extraction problem for PRTSA.