Topology identification in distribution system via machine learning algorithms

This paper contributes to the literature on topology identification (TI) in distribution networks and, in particular, on change detection in switching devices’ status. The lack of measurements in distribution networks compared to transmission networks is a notable challenge. In this paper, we propose an approach to topology identification (TI) of distribution systems based on supervised machine learning (SML) algorithms. This methodology is capable of analyzing the feeder’s voltage profile without requiring the utilization of sensors or any other extraneous measurement device. We show that machine learning algorithms can track the voltage profile’s behavior in each feeder, detect the status of switching devices, identify the distribution system’s typologies, reveal the kind of loads connected or disconnected in the system, and estimate their values. Results are demonstrated under the implementation of the ANSI case study.


Introduction
Traditional planning and design of distribution networks and power distribution grids make use of the so-called "fit-and-forget" strategy [1], whereby networks are minimally monitored, and activation and sensing are insignificantly applied. With the increase in the demand for electricity and the number of end-use users, however, the operation and control of power grids have become more and more complex and challenging. Primary among these challenges are the scantness of the communication infrastructure to provide measurements or dispatch real-time commands; the large-scale permeability of the micro-generators from the fluctuating energy resources such as penetration of wind turbine; the local over-voltage and power-line congestions problems [2], in distributed power generation, in particular within the strongly radial, resistive, low-voltage networks; and the connection of dispatchable loads to the power distribution network, concerning, for example, the problem of plugging in the smart buildings as well as electric vehicles. Modern power distribution grids would experience serious congestion problems in case of non-enforcement of appropriate scheduling and coordination protocols [3,4]. In response to these challenges and issues in the power distribution networks and a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 constant parameters, while [36], following [35], has presented an adaptive observer for a class of nonlinear systems with a general nonlinear parameterization under the assumption of boundedness of the state and unknown parameter of the system. This paper complements the extant literature by approaching the grid topology's identification issue via machine learning (ML) algorithms. The rest of the paper is organized as follows. In Section 2, a brief review of ML algorithms is discussed. In Section 3, we present and discuss the main results. Finally, Section 5 summarizes our conclusions.

Machine learning
Machine learning (ML) can be described as the process through which a machine (i.e., a computer) can learn to perform a task with new data or configurations without the need for re-programming [37][38][39]. The technique is tailored to uncover structure in complex data and efficiently describe it with minimal parameters [37]. The concept of ML originates from the recognition of the pattern and computational learning theory in the artificial intelligence field [40]. Algorithms developed with ML methods are expected to provide learning to perform a task concerning future data that is not present to the algorithm during the training process [41].

Algorithms grouped by learning style
There are different kinds of algorithms or techniques that can model a mathematical or statistical problem depending on the interaction with experiences or environments, or in general, what is called the input data. In other words, depending on the task and purpose of the problem, we may categorize ML algorithms into different classes. A general typology of ML algorithms based upon the learning styles include, among others, the following [42]: a) supervised machine learning (SML) algorithms; b) unsupervised machine learning (UML) algorithms; c) semi-supervised machine learning (S-SML) algorithms; d) reinforcement machine learning (RML) algorithms. In SML, the algorithmic program is "trained" on a pre-defined set of "training examples," which then facilitate its ability to reach an accurate conclusion when given new data. In contrast, in UML, the program is given data and must find patterns and relationships therein. The S-SML approach takes advantages of both supervised and unsupervised approaches by the ability to work on labeled and unlabeled datasets. Finally, in an RML setting, which is modeled on reinforcement psychology, an agent repeatedly interacts with the environment: at each time step, the agent (a) observes a state of the environment; (b) chooses an action based on its policy-a mapping from the observed states to the actions to be taken; and (c) receives a reward and observes next state. This process continues until a terminal state is reached. As our main interest in this paper lies in the SML algorithms' context, we first briefly introduce the basic concepts of this class of ML algorithms.

Supervised learning
The supervised case assumes prior knowledge of the underlying phases of the system and, as such, supervised approaches rule out the possibility of learning unknown phases. Contrary to the supervised case, unsupervised methods learn the structure from the data itself without the need for prior labeling [37][38]. In the supervised learning approach, certain inputs and a group of labeled outputs (targets) are provided as well as a certain output in the training process. The algorithm then uses machine inference to develop a function capable of emulating the process and mapping the new input data to the predicted output. In supervised learning, the input data are called the training data with a given label or real outputs like spam or notspam or a stock price at a time. In the training process, a model is prepared wherein anticipations are necessary, and it is corrected in case of the wrong prediction. Therefore, the training procedure proceeds until the model obtains the proper precision level on the training data. Classification and regression problems are two important examples of supervised learning [42].
The classification problem aims to categorize existing samples or data into a different kind of class. In this group, the existing target value or output samples(label) are discrete or categorical data, and the modeled machine is called a classifier. At first, in such a problem, by using inputs and their label outputs(targets), a trained machine model is provided, and then the machine predicts new data into a different class. In a regression problem, the target value or existing label is a continuous variable, and the modeled machine is called a predictor, thus for the input variable, the machine predicts their actual target value [42]. We provide the algorithm with a certain input, a certain output, and a group of labeled training data in supervised learning. The algorithm then uses machine inference to develop a function capable of emulating the process and mapping the new data. In other words, supervised learning algorithms aim to model the association of target prediction output with input characteristics to enable us to predict output values for novel data according to the mentioned associations those relationships that it has learned from earlier datasets. Examples of supervised machine learning algorithms include: 1) K-Nearest Neighbors (KNN); 2) Support Vector Machine (SVM); 3) Decision Trees (DT); and 4) Ensemble Learning (EL).

K-Nearest neighbors (KNN).
It is well known that the K-Nearest Neighbors algorithm (KNN) for machine learning (ML) has been introduced as the non-parametric approach employed to classify and regress areas [43]. Within these two areas, the input data includes kclosest training samples in the feature space. Therefore, output based on if the KNN is utilized to classify or regress is categorized into two parts as below: 1. In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to that single nearest neighbor's class.
2. In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.
Moreover, according to the KNN classification, the output is a class membership. An object is categorized by a plurality vote of its neighbors so that the object has been allocated to the class that is the commonest among its k-nearest neighbors (k represents a positive integer, usually small). Therefore, if k = 1, then the object is readily allocated to the class of that singlenearest neighbor. Besides, KNN has been presented as one of the types of sample-based learning or lazy learning wherein the function is merely estimated locally, and each computation would be deferred till classification" [43]. Vapnik and Cortes (1995) introduced SVM theory in which a hyperplane or a series of hyperplanes is constructed and utilized for both classification and regression [44,45]. By considering the labeled training set S = (x l , y l ); l = 1;. . .; L of size L, and y l 2 1,-1. The SVM can be obtained by solving:

Support vector machine (SVM).
subject to where ϕ(x l ) represents a nonlinear transformation mapping x l in a high dimensional space that is called kernel function. The slack variable ξ l represents nonlinearly separable training sets, and C denotes the parameter of a tunable positive regularization. To achieve a distributed SVM, (1) can be rewritten as follows: subject to where N denotes the number of groups working together to train the SVM and w i represents the parameter of the local optimization for each group. By introducing a global variable z, the (4) can be reformulated as: To solve (7) distributively, variables {z, w i } i = 1,. . ., N can be partitioned into two sets represented by {z} and {w i }, i = 1,. . . N, and the Alternating Direction Method of Multipliers (ADMM) can be applied to solve the problem. Specifically, the scaled augmented Lagrangian function can be expressed as follows: where ρ denotes the step size and μ i represents the scaled dual variable. At each iteration k, {w i },{z}and μ i can be updated as follows: Note that the process of updating the w i can be done locally in the i th group. Moreover, it involves the fitting of a SVM to the local data using an offset in the quadratic regularization term. The vector {z} is expressed as: which can be solved analytically as: Finally, the scaled dual variable μ i can be updated by: In constructing historical data, since the number of classes in topology samples depends on the number of switches, so the number of classes can be more than two groups, therefore a multi-class SVM is used [46,47], and for each nonlinear classifier, Gaussian kernel is applied in which δ is a mutable parameter. The main factors affecting the performance of SVM comprise kernel function and its parameters as well as the soft margin parameter C. The optimal Gaussian kernel parameter and soft margin (C) can be used for improving the efficiency of nonlinear SVM. Gaussian kernel, which has a single parameter (γ), is a typical choice for SVM [48]. The common practice for finding the best values of C and γ is to conduct a grid search, i.e., repeat the calculations with different C and γ combinations and determine the values yielding the best accuracy through cross-validation [49]. In Step 1, historical data are prepared by the independent system operator (ISO). In Step 2, the local and global optimization parameters are initialized.
Step 3 comprises two parts. First, the local optimization parameter w i is obtained by solving the optimization problem defined in (12) under constraints (13) and (14). Then, the global optimization parameter z is optimized in (16), where � w and � m are obtained by in-network processing through messages that were only exchanged among neighboring groups. Following the optimization solution and the achievement of convergence of w i by each group, the local and global optimization parameters, i.e., w i and z, respectively, are returned in Step 4.

Decision Tree (DT).
A Decision Tree (DT) is a predictive model where supervised learning and non-parametric methods with a hierarchical structure are used to classify different data types, and results are delivered in a flowchart with a tree-like structure. According to the dependent variable type, this algorithm is divided into two categories: regression trees for continuous variables and classification trees for discrete variables. In the DT algorithm, fragmentation of data is implemented using the features as a tree, and for better understanding, it is written using if-then rules [50]. Depending on the training data, a feature for the data is selected at each stage of this method, and the data set is decomposed into a further class grouping based on chosen features. This process continues until all the data in a category has a single label.

Ensemble Learning (EL).
Ensemble Learning (EL) methods represent a machine learning approach in which multiple learning algorithms are used in a parallel fashion to achieve more acceptable predictive functions in comparison to the situation that may be obtainable from each constituent learning algorithm [51,52]. While in statistical mechanics, a statistical ensemble is generally infinite, an ML ensemble contains a finite set of alternative models. This, in turn, leads to a structure with higher flexibility than the individual alternatives. Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model to decrease variance (bagging), bias (boosting), or correlation (random subspace).

Bagging algorithm.
Bagging, also known as bootstrap aggregation, is a machine learning ensemble meta-algorithm designed to improve machine learning algorithms' stability and accuracy in statistical regression and classification. Moreover, the algorithm can decrease the variance and prevent overfitting. Leo Breiman (1994) proposed bagging to improve the classification by combining classifications of randomly produced training sets. The bagging method is modeled based on the instability of base learners, which can be utilized to modify such unstable base learners' predictive performance. The main idea is that, given a training set S of size n and a learner L, which commonly is a decision tree, bagging creates m new training sets S i with replacement. Then, the bagging algorithm applies L to each S i to build m models. The final output of bagging is based on simple averaging [53].
Even though it is commonly used in combination with DT methods, bagging can be applied with any kind of technique. In fact, bagging is a special case for the model averaging approach [52].
2.3.6 Boosting algorithm. In ML, boosting is an ensemble meta-algorithm designed primarily to reduce bias [54], and variance in supervised learning includes a family of ML algorithms that convert weak learners into strong ones [55]. Boosting is based on the question first posed by [56,57]: "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the proper classification (although it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification. While boosting is not algorithmically constrained, most boosting algorithms consist of learning weak classifiers concerning distribution and adding a final strong classifier. After the addition, the weak learners are weighted in a way related to the weak learners' accuracy, and the data weights are readjusted, a process is known as "re-weighting." Misclassified input data gain a higher weight while examples that are classified correctly lose weight. Thus, future weak learners focus more on the examples than on previous misclassified weak learners.

Random subspace algorithm.
In ML, the random subspace method, also called attribute bagging or feature bagging, is an ensemble learning method that attempts to reduce the correlation between estimators in an ensemble by training them on random samples of features instead of the entire feature set. The random subspace method is similar to bagging, except that the features are randomly sampled with the replacement for each learner. Informally, this prevents individual learners from over-focusing on features that appear highly predictive/descriptive in the training set but fail to be as predictive for points outside that set. For this reason, random subspace algorithms are an attractive choice for problems where the number of features is much larger than the number of training points. The random subspace method has been used with DT. When combined with "ordinary" bagging of decision trees, the resulting models are called random forests. The method has also been applied to linear classifiers, support vector machines, k-nearest neighbors, and other types of classifiers [58].

Rationale for choosing a dataset.
The power system analysis is generally done by monitoring of current status and computing several distinct parameters. One of the most important parameters in the power system is the transient voltage. As is well known, any change in the network, such as line and generation outages, affects the network operation status and its parameters, such as voltage in transient and steady-state mode. Thus, the tracking of the status of the current power system requires the analysis of the transient voltage. To simulate the fault in the network, we used the switch, so that when a switch is disconnected, a line or generation unit is disconnected from the network. This in turn implies that the analysis of the power system where the fault occurred requires tracking the changes in the network transient voltage. The changes and faults that occurred in the network are exactly proportional to the voltage changes, which means that the machine learning process can learn all the voltage changes in all possible fault scenarios and, after the training process, can detect the type of fault and the fault scenario that occurred.

The source of the data and how it was obtained.
We simulated a real standard power distribution system in ETAP software. To simulate the fault, we used 7 power switches,

Tracking the switching devices
First, we analyze each switching device's status, such as the circuit breaker and its impact on the main feeder's voltage profile. Fig 2 shows an ANSI case study as an example distribution network, including the main bus as a utility and three sub-bus and several AC and DC sub-network. In this case study, we consider seven switching devices such as, i.e., the circuit breaker, which can be seen in red color in Fig 2. Thus, we apply 7-class supervised learning. We define an event for each device separately and analyze the transient stability of the system in ETAP

PLOS ONE
software. We assume all devices in t = 5 are opened, in t = ten are closed, and in t = 15 are opened again separately. For some switching devices, voltage profiles in different buses were analyzed. Figs 3-7 show the voltage profiles related to each switching event. According to these graphs, each voltage profile characteristic at the switching time is more important than the switching time since each circuit breaker or other switching devices may be switched several times over the same period of time.
Moreover, the voltage profile related to each typical switching event displays intermittent nature. According to each switching event's voltage profile, we categorize them into seven classes to model a trainable machine. Therefore, our model categorizes events into 7 clusters since we have seven switching devices, resulting in a trainable machine as a 7-class machine.  Fig 2, after switching CB3, Bus 2 (Sub2B) is disconnected from Bus 3, and not only one transformer goes out of the circuit, but also the switching of the DC source affects Bus 2B since the voltage profile in Bus 2B looks like a parabola as Fig 4. Fig 5 shows a similar situation. Figs 6 and 7 show the voltage profile in Bus3 when switching S3 and S1 are switched, respectively.
In summary, using the supervised paradigm, we can detect which circuit breaker or other switching devices are switched, and we can identify which switch has been connected or disconnected. Fig 8 displays the topology identification (TI) process and its performance of the two tasks: 1) detection of switch number and 2) detection of the state of switching.     Tables 1-3 show the KNN, SVM, and EL algorithms' performance, respectively, in the test process and after the training process. In the test, the process machine is evaluated by the samples which have not been observed before. The accuracy of the learning algorithm is commonly evaluated according to the following formula:

PLOS ONE
The more the machine detects the correct samples, the more accurate it will be. The results clearly indicate that supervised algorithms can identify switches' status and the system's topology with a significant robustness degree. According to Tables 1 and 2, all of the KNN and SVM algorithms display a proper performance and can be viewed as a suitable algorithm for topology identification with high accuracy and low time duration. According to Table 3, the EL algorithms (bagging and random subspace KNN) also display a high performance. However, the accuracy metric for boosting and RUS boosted tree is not as good as expected.   Fig 9 shows the classification error of 10-fold cross-validation for EL with the KNN subspace method. As we see, by selecting the number of learners equal to 100, we have proper accuracy in our model. According to Fig 9, the error rate is 5.5% in the test process, suggesting that we can detect any switching event and identify the distribution network's topologies with a satisfactory degree of accuracy.

Tracking of the special load switch
It is possible that after any switching event, several loads are disconnected from the system at the same time. In this section, we show that our approach is capable of detecting switching events when only one load is connected to the switch. We assume several switches even though only one type of special load is connected to these switches, e.g., after switching event, only one of them is connected or disconnected to the feeder. We consider three different types of loads, i.e., induction motor, static load, and electric vehicle. The main goal is to categorize the voltage profile related to these three different types of loads. Figs 10-12 show voltage profiles

PLOS ONE
related to these switching events. Fig 10 indicates the voltage profile when the event is on the static load. In other words, when the switch for the static load is disconnected or connected from the rest of the grid. Fig 11 indicates the voltage profile when the event is on the induction motor, resulting in being disconnected from the grid twice. Fig 12 shows the voltage curve, a parabola, indicating that a DC source has been disconnected from the network. Table 4 shows the accuracy results. We conclude that the proposed methodology can accurately detect the type of load connected or disconnected to the distribution system. Fig 13 shows the percentage error of ensemble learning with the KNN random subspace method. The model has a satisfactory topology identification (TI) performance by selecting the number of learners equal to 100. As one can observe, the error is below 6 percent.
According to Fig 13, the test process presents an error rate of 5 percent. This result suggests that we can detect any switching event on various loads and track the special load switch.

Conclusions
This paper proposes an approach to topology identification (TI) of the distribution system based on machine learning algorithms. This methodology does not require any sensors and measurement devices to analyze the feeder's voltage profile. This stresses this point since switching devices' changes affect the feeder's voltage profile in the distribution network. We show that by tracking the voltage profile's behavior in each feeder with an ML algorithm, we can detect switching devices' status and identify the distribution system's typologies. It should

PLOS ONE
be noted that the concept of voltage used in the paper is confined to transient voltage. That is, the transient voltage generated by an event in the network is completely different the one generated by another event. Alternatively, the voltage across the network is dependent upon different events. The transient voltage is similar to a label for a specific event, which detects different switching modes by examining the transient voltage and in final analysis diagnosing the network topology. However, in the case where the voltage is in steady-state mode, the identification from only the voltage curves cannot solve the non-radiative grid topology. We also show that by tracking the status of switching devices, the ML approach can detect a high degree of accuracy, which kind of load is connected or disconnected to the system. Professor, Faculty of Engineering, Kharazmi University, Tehran, Iran) for his critical reading and constructive feedback on part 3.2 of the paper, "Tracking of the special load switch" (especially Figs 10-12); Dr. Kamran Ahmadgoli (Associate professor, Faculty of Literature and Humanities, Kharazmi University, Tehran, Iran) for his English language assistance, editing and proofreading of the paper; and Dr. Seyed Mahdi Ghamkhari (Assistant Professor, Department of Electrical and Computer Engineering, University of Louisiana at Lafayette, USA) for his preliminary advice to Peyman Razmi (the first author of the paper) during the first stage of planning and developing this research, around two years ago. Despite our best efforts through calls and emails over the last two months, Dr. Seyed Mahdi Ghamkhari was not available for his final judgment on the paper. Nevertheless, we would like to express our heartfelt gratitude to him and the rest of the friends and experts who helped this project take shape and come to final fruition in the complicated atmosphere of the global COVID-19 pandemic.