Classification of Partial Discharge Measured under Different Levels of Noise Contamination

Cable joint insulation breakdown may cause a huge loss to power companies. Therefore, it is vital to diagnose the insulation quality to detect early signs of insulation failure. It is well known that there is a correlation between Partial discharge (PD) and the insulation quality. Although many works have been done on PD pattern recognition, it is usually performed in a noise free environment. Also, works on PD pattern recognition in actual cable joint are less likely to be found in literature. Therefore, in this work, classifications of actual cable joint defect types from partial discharge data contaminated by noise were performed. Five cross-linked polyethylene (XLPE) cable joints with artificially created defects were prepared based on the defects commonly encountered on site. Three different types of input feature were extracted from the PD pattern under artificially created noisy environment. These include statistical features, fractal features and principal component analysis (PCA) features. These input features were used to train the classifiers to classify each PD defect types. Classifications were performed using three different artificial intelligence classifiers, which include Artificial Neural Networks (ANN), Adaptive Neuro-Fuzzy Inference System (ANFIS) and Support Vector Machine (SVM). It was found that the classification accuracy decreases with higher noise level but PCA features used in SVM and ANN showed the strongest tolerance against noise contamination.


Introduction
Important power system equipment such as gas insulated switchgear, transformers and high voltage (HV) power cables operation life span is highly dependent on the insulation quality. They will be permanently damaged if insulation breakdown occurs. Failure in any part of the power system will be detrimental to energy generation and transmission companies. Hence, it is extremely important to check the insulation quality frequently. Partial discharge (PD) a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 measurement is globally accepted as a useful diagnostic technique with the ability to assess insulation material for its condition [1]. According to the IEC 60270 standard, PD is defined as "localized electrical discharge that only partially bridges the insulation between conductors." [2]. PD is repetitive in nature and able to spread across the dielectric material. PD intensifies existing insulation impairment and causes steady deterioration of insulating quality, ultimately leading to electrical breakdown, hazard to personnel, environmental damage and costly equipment failures [3]. Since PD events may lead to disastrous results with both safety and financial consequences, detection of PD events are used as a key method in insulation system condition monitoring [4].
PD classification is of interest because of the relationship between the PD activity and the dielectric materials aging process. Since each defect has a unique deterioration behavior, it is important to recognize the association between the PD patterns and the defect type in order to determine the insulation quality. PD pattern recognition is crucial in determining substantial risk of an imminent insulation breakdown and consequently whether the current component requires servicing and replacement or not [5]. Many works have been performed on PD classification in various power system equipment, such as gas insulated switchgears and substations [6,7], power cables [8,9] and transformers [10,11]. Commonly used classifiers include neural networks [7,12], fuzzy logic [13,14] and support vector machines [15,16].
PD has a group of unique discriminatory attributes, which allows them to be recognized. In order to perform PD classification, it is necessary to choose which discriminatory features to be extracted and which feature extraction method to be used [17]. The purpose of feature extraction is to extract meaningful input feature from the unprocessed PD data to represent the PD pattern associated with a specific defect [18]. These extracted features are used as input of the classifier during the training process. Feature extraction also helps to reduce the size of raw PD data for quicker and simpler handling. PD classification requires some sort of data reduction method, such as reducing the matrix size [19]. This is due to unprocessed PD data which may contain thousands to millions of individual pulses are too huge to be used as input to the classifiers as it will drastically increase the training time and cripple the performance of the classifier [20,21].
Most of PD classification works were performed in lab environment and under noise free environment. However, in reality, on site PD measurement suffers from lower detection sensitivity due to the interference of external noises [22]. PD measurement often faces interference caused by radio transmissions, power electronics components, random noise from switching, lightning, arcing, harmonics and interferences from ground connections [23]. A lot of research work has been performed on denoising PD data. One of the methods involves setting a threshold and ignoring PD data that are 10% of the maximum PD amplitude. However, it was found to be insufficient as high threshold level might neglect real PD pulses with low magnitude and low threshold level will include noise [24,25]. Using the mean square error as a benchmark to compare 28 types of denoising technique, wavelet based denoising was found to be the best with good signal to noise ratio [26]. Numerous research works have also used wavelet transform for denoising purposes, especially the Daubechies wavelet, which is capable of detecting high frequency, fast decaying, short duration, and low amplitude signals [27,28].
PD denoising techniques have improved over the years. However, a perfect and universal denoising standard has yet to be achieved. Therefore, some researchers have included artificial noise signals into PD data before evaluating the PD classification model in order to replicate the practical scenario. For example, adding evenly distributed random number to phase and charge of PD data [1,29], adding white noise with zero mean and fluctuating power [23], including random numbers with various standard deviation and zero mean [30] and merging randomly distributed noise that are within 10 to 30% of the test data [31][32][33][34][35][36][37]. The effect of adding noise are summarized as follows; in [1], the accuracy of ANN reduces from 79% under noise free condition to 42.2% with 10% added noise, in [36,37], the accuracy of ANN reduces from 100% under noise free condition to 80% with 30% random noise, in [34,38], when 30% noise was introduced, the accuracy of ANN reduces from 100% to between 70 and 80% depending on the input feature used. In [31], ANN accuracy reduces from 93.7% to 83.3%.
However, artificially generated noise using software, as applied in previous works may not represent real world scenario. Therefore, in this work, classifications of cable joint defect types from PD measurement under noisy environment were performed. Real life noise obtained from ground interference instead of software generated noise as commonly used in past works was used in this work. This is a better representation of noise encountered on-site. Five crosslinked polyethylene (XLPE) cable joints with artificially created defects were prepared. After PD measurement was performed on each cable joint sample, different input features were extracted from the PD pattern under artificially created noisy environment. These include statistical features, fractal features and principal component analysis (PCA) features. The input features were used to train the classifiers to classify the PD defect types using Artificial Neural Networks (ANN), Adaptive Neuro-Fuzzy Inference System (ANFIS) and Support Vector Machine (SVM). At the end of the work, comparison between different combinations of feature extraction and classifiers was made to determine which method has the highest classification accuracy result or highest noise tolerance.
Time series analysis is a very useful tool. The directed weighted complex network method can be used to distinguish and characterize different dynamical regimes associated with unstable periodic orbits from time series signals [39]. For nonlinear dynamic behavior in gas-liquid two-phase flow, the multivariate weighted complex network can be used [40]. On the other hand, multivariate pseudo Wigner distribution allows uncovering local flow behavior revealing different oil-water flow patterns [41]. Gao et al. proposed a multiscale limited penetrable horizontal visibility graph to analyze nonlinear time series [42] and then developed a novel AOK-TFR based visibility graph to classify epileptic EEG signals [43].
The rest of the paper is organized as follows. Section 2 describes the test samples preparation. In section 3, the measurement setup is outlined. Section 4 elaborates on the feature extraction methods used. The classifiers used are explained in Section 5, followed by the results in Section 6. Lastly, the conclusion can be found at Section 7.

Sample Preparation
Five 11 kV XLPE cable joint with different artificial defects were prepared. The total length of each cable sample is 3 meters with a cable joint located in the centre. The details of the defect nature of all cable joint samples are shown in Table 1.
Insulation incision defect was prepared by creating a shallow cut at the XLPE surface using a blade. Axial direction shift defect was prepared by inserting the cable at a shifted angle. Semiconductor layer tip defect was made by making numerous sharp edges at the semiconductor

PD Measurement Setup
Fig 1 shows the block diagram of the PD measurement system that was used in this work. The measurement setup comprises of a step-up transformer that serves as a high voltage source, a coupling device, a test object, a coupling and measuring capacitor, a USB controller and a PD detector connected to a personal computer. A personal computer (PC) was used to store the measured PD data. A commercial PD detector MPD600 from Omicron was used in this work. All measurements were performed at 9 kV, which is slightly less than the 11 kV rated voltage of the cable. This is because operating at higher applied voltage will significantly increase the likelihood of insulation breakdown at cable joint defect, which will cause permanent damage to the test sample. Each cable joint was energized to 9 kV and allowed to be idle for 1 hour for the PD to reach a steady state before PD measurement was taken. Each PD measurement was taken for 1 minute with a time gap of 15 seconds between every measurement. A total of 100 measurements were performed on each cable joint sample. The results are shown in term of phase resolved partial discharge (PRPD) patterns, a 3D plot with phase, charge magnitude and pulse count as the main axis.

Feature Extractions
In this work, three different feature extractions method were used to obtain relevant identifiers from the PD data; they are statistical features, fractal features and principal component analysis (PCA) features. These features are chosen because they are the most commonly used features in PD classification. They are then combined together to enhance the performance of classifiers using multiple features instead of individual features. These input features were used to train the ANN, ANFIS, and SVM to classify defect types.

Statistical Features
PD data can be characterized by two main distributions; pulse count distribution, which is the number of PD vs. phase angle and pulse height distribution, which is the PD charge magnitude vs. phase angle. These distributions can be further split into two separate distributions, which are the negative and positive half cycles. Statistical features were extracted from these PD distributions, which include skewness, kurtosis, mean, variance and the Weibull parameter.
Skewness is the degree of asymmetry of the distribution with regard to the normal distribution. Positive skewness shows that the distribution is asymmetric with a bigger left side, zero skewness shows that the distribution is symmetric and negative skewness shows that the distribution is asymmetric with a larger right side [44].
Kurtosis is the degree of the sharpness of the distribution with regard to a normal distribution. Zero kurtosis shows that the distribution is a normal shape, positive kurtosis shows that the distribution is a sharp shape and negative kurtosis shows that the distribution is flat shape [45].
Variance is a measurement of how much a cluster of numbers is spread out. Zero variance shows that all values in the distribution are identical. The standard deviation is acquired by calculating the square root of the variance. A very detailed mathematical description of skewness and kurtosis can be found in [46]. The mean, variance, skewness and kurtosis are calculated using where f(x i ) is the function of interest, N is the size of the data and x i is discrete values of the distribution. Weibull analysis is a mathematical approach to characterize the pulse height analysis pattern. The probability distribution of PD pulse rate, F can be expressed by the Weibull function [20,47] as where α and β represents each pulse height analysis curve and the PD pulse amplitude is represented by q. The features α+, β+, α-and β-are obtained from the negative and positive pulse height analysis curves [20]. The pulse height analysis pattern is then compacted using the Weibull method for statistical analysis while keeping its relevant information. The values of α+, β+, α-and β-are then used as the input to the intelligent classifiers along with variance, skewness, kurtosis and mean.

Fractal Features
Fractal features are suitable for modeling complex shapes and natural phenomena where current mathematical methods are found to be inadequate. Since PD can be treated as a natural phenomenon that has complicated shapes and surfaces, fractal features can be used to model it. The implementation of fractal features in PD recognition is interesting because it characterizes the PRPD pattern directly [48]. Fractal features can also be used for pattern recognition [49]. PRPD pattern can be characterized using two fractal features, fractal dimension and lacunarity, which are computed by using box counting technique. Fractal dimension is one of the main fractal features that could be computed from an image surface. In theory, fractal dimension is invariant to changes in scale and has the potential to be used for measuring the coarseness of the surface. However, fractal dimension alone is not enough to be a discriminatory feature because different surface may have the exact same value of fractal dimension. In order to solve this problem, Mandelbrot has introduced a new variable called lacunarity, which represents the compactness of the fractal surface. Both fractal dimension (D) and lacunarity (Λ) are functions of the box size L. The number of boxes N, of side L needed to cover a fractal set is governed by where D is the fractal dimension set and K is a constant [50]. MðLÞ where m is the box number. In this work, PRPD patterns were converted into a binary image and the software ImageJ was used to calculate the fractal dimension and lacunarity using the box counting method [52].

Principal Component Analysis
PCA, also known as the Karhunen-Loève (K-L) method is a data reduction method that can filter out the important factors from a big group of data [53]. It is able to transform the data from a very high dimension to a lower dimension. This is done without compromising data information in the reduced space, with only minimal information loss. This is achieved by projecting data at a direction with the biggest variance at a lower dimension that will maximize the scatter of the projected samples [54]. This linear subspace is found by solving an Eigen problem, where cov(X) is the covariance matrix of the dataset X, M is a linear mapping created by the d principle eigenvectors of the covariance matrix and λ are the d principal eigenvalues. The lowdimensional data y i of the data points x i are calculated using linear mapping Y = XM. The elements of Y will produce the feature sets [17]. This covariance matrix is able to determine which direction contains the most significant variance in the dataset, making PCA an effective tool for feature subset selection. The most important concern in PCA is the amount of principal components required to obtain an accurate representation of the original data. The best number of principal components to represent the data can be found by using a scree plot. Scree plot is a graph of the eigenvalue magnitude vs. its number. The best number is chosen at a point where the graph has a sudden change in a slope, where the slope on its left side is much higher than the right side. [53].
The PD data were arranged into 3 column matrix of phase, magnitude and pulse count which is similar to the PRPD format. Two situations were considered for PCA feature extraction. In the first situation, the PD matrix was split into four distributions, negative and positive section of the charge magnitude while the phase was divided into two 180 degrees' quadrants. In the second situation, the PD matrix was arranged into six distributions, negative and positive section of the charge magnitude while the phase was divided into four 90 degrees' quadrants. PCA was performed on these distributions to obtain the first and second principal components.

Classifiers
Three intelligent classifiers were used; they are Adaptive Neuro-Fuzzy Inference System (ANFIS), Artificial Neural Network (ANN) and Support Vector Machine (SVM). These classifiers were trained and then used to classify defect types of the cable joint samples in this work.

Artificial Neural Network (ANN)
ANN is suitable for PD classification because it is insensitive to small input changes. ANN has the ability to continue making correct decisions even when the input presented is slightly different from the input used during training process. This is very important for PD classification where the discharge patterns are usually not the same [55].
ANN consists of one layer of input, a minimum of one hidden layer and one output layer [1,56]. The feed forward back propagation neural network is the most commonly used learning mode in ANN [57]. It is a supervised learning network that is trained in a forward backward process. In the forward process, the biases and weights are initialized into random small values. The feature vector that belongs to its correlating sample is then used to compute the neurons output in each layer using an activation function that can be threshold using different functions [12].
Every layer in the ANN is completely connected to the following layer. The main purpose of the hidden layer is to obtain PD features from different sources and send the information to the output layer. The amount of processing elements in the input layer relies on the amount of PD fingerprint data. The amount of processing element within the output layer is dependent on the number of defect types to be classified [45]. For PD classification purpose, a minimum of two input features are required to avoid divergence during training [58]. In this work, a multilayer feed forward ANN with 15 neurons at the hidden layer and the scaled conjugate gradient back propagation training function were used. Sigmoid function was used as the activation function at the hidden layer and output layer.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
ANFIS uses neural networks and fuzzy systems to find the best fuzzy parameters [59]. The usage of neural network omits the requirement to select the fuzzy parameters manually because it will be done by the neural network. The fuzzy system must be built using fuzzy logic prior to the fuzzy scheme training in ANFIS. ANFIS is based on a fuzzy Sugeno model introduced by Takagi, Sugeno and Kang. ANFIS is a great tool to map PD patterns to the defect type using If-Then rules formed by the decision tree and the stipulated input output data [60].
The ANFIS architecture has five important layers [61]. The first layer is filled with nodes called adaptive nodes. The outputs of this layer are known as the fuzzy membership grade of the inputs. In the second layer, it contains constant nodes that function as a multiplier for incoming signal. The output of this layer is called the firing strength of the rules. The third layer contains fixed nodes, which concentrates on normalizing the second layer's triggering strength. In the fourth layer, the nodes are adaptive nodes. It will produce output which is the product of the first order polynomial and the firing strength that had been normalized. The last layer contains only one fixed node, which does a summation of all output signals from the previous layer.
Rules fuzzification is done by allocating fuzzy membership function (MF) to each condition in the premise part of the rules. Each input variable is normalized between zero and one in order to increase the training efficiency [62]. Utilizing these fuzzy rules, ANFIS is used to train, test and analyze the Sugeno-type fuzzy inference system [63]. Every rule output works as a linear combination of input variables and a fixed value. The final output is the output weighted mean of each rule. These weights are automatically altered using the information acquired during the training process. For ANFIS, Matlab command "genfis2" was used to generate a Sugeno type fuzzy inference system using subtractive clustering. "Genfis2" was used instead of "genfis1" since it is more suitable for large amount of data used in this work. The ANFIS used has 20 "epoc" and 1 "radii," where "epoc" is the maximum number of times before the training process is stopped. "Radii" is a vector that specifies a cluster center range of influence in each data dimensions, assuming the data falls within a unit hyperbox.

Support Vector Machine
SVM is a machine learning algorithm that stem from statistical learning theory. This learning machine uses a main concept of SVM, which is kernel for a variety of learning tasks. Using kernel methods, SVMs can be adjusted to multiple types of tasks by using different base algorithm and kernel functions. SVM excels in pattern recognition problems involving nonlinear, small sample size and high dimensionality [64].
SVM is a method for searching functions from a group of known as training data. Individual group of PD pattern data can be characterized by specific input features. Therefore, each group of data can be designated by a vector whose size and dimension relies on the number of input features selected to characterize it. The function can be either a regression function or a classification function. It is commonly utilized to process classification and regression problems.
SVM is initially intended to handle linearly separable cases. Unfortunately, not all practical problems are linearly separated. When dealing with non-linear problems, conventional SVM as a linear classifier will not function effectively. To overcome this problem, a technique known as kernel was presented to deal with non-linear problems using multiple linear classifier. According to the pattern recognition theory, a lower dimensional space and non-linear inseparable model are transformed into linear separable by mapping it nonlinearly into a higher dimensional feature space. Therefore, the usage of kernel method will avoid the curse of dimensionality [15].
SVM algorithm was initially intended for binary classification, which means they can only classify inputs into two classes [65]. This is because SVM uses a hyper plane to split data into two categories. If more than two groups of classification are required, multi-level SVM is needed [66,67]. Multi-level SVM is a one against all classifiers, where multiple binary SVM is performed. During multi-level SVM training, a category sample will be classified as one class while the other residual samples as other classes. In this work, a multilevel SVM with the radial basis function kernel was used as the classifier.

Results
The measurement results from this study are shown in this section. The results include the PD patterns measured and classification accuracy results of ANN, ANFIS, and SVM using statistical, fractal and PCA features. Next, the noise tolerance of each classifier was examined.

Measured PD in PRPD format
In order to determine the classification accuracy of each feature extraction and intelligence classifier method under noisy environment, the classifiers are trained using uncontaminated input but tested with inputs that are contaminated with noise. Feature extractions are performed on the contaminated PD data and used as the test input to each classifier method. The noise contamination is recorded from interference from the ground during lightning events. The recorded noisy signals are added to the noise-free PD data for duration of between 5 and 60 seconds.
Four of the PRPD patterns of the recorded noisy signal are shown in Fig 2. It can be seen that the noise pattern occurs randomly at every phase with a maximum amplitude of 250 pC and the number of PD activity increases as the duration of noise increases. Since it is impossible to control the amplitude of the noise contamination, different duration of the noise contamination was used to examine the classification accuracy of each classifier methods under noisy environment.
The phase resolved partial discharge (PRPD) pattern of all cable joint samples that have been measured is shown in a 3D plot in Fig 3. Based on visual inspection on the PRPD patterns, the insulation incision defect has two tall peaks at the end of both positive and negative cycles. The axial directional shift defect has more PD activities in the positive cycle, which accumulate at the first quadrant. It has a very sharp peak at around 80 degrees. The semiconductor layer tip defect has PD activities, which extend evenly between the positive and negative cycles. It has 5 noticeable peaks, 3 at the negative cycle and 2 at the positive cycle. The metal particle on XLPE defect has one main PD group at each positive and negative cycles and it has a prominent peak at 260 degrees. The semiconductor layer air gap defect has two main PD groups; one at the positive cycle and another at the negative cycle with a peak at 230 degrees. Two small clusters of PD with high charge magnitude but low pulse count can also be seen, where the negatively charged PD spread out between 180 to 360 degrees while the positively charged PDs are distributed between 0 and 180 degrees.
Although different defect types of cable joint have different PRPD patterns, classification of different defect types in the cable joint samples can be hardly done based on visual inspection  on these PRPD patterns alone. Therefore, feature extractions from PD data and intelligent classifiers were used in this work to classify different defect types in the cable joint.

Feature extraction results
Using the feature extraction method of statistical features, fractal features and PCA features, eight groups of input feature data were obtained. Statistical features are split into three groups, the first group consist of variance, skewness, kurtosis and mean (var, skew, kur, mean), the second group consists of Weibull parameters while the third group is the combination of the first two groups. Fractal features were also split into three groups, which are fractal dimensions, lacunarity and a combination of fractal dimensions and lacunarity. PCA features were split into two groups and PCA features from 4 and 6 distributions are extracted. These input features were used as the input for the classifiers to determine the classification accuracy of each method. Sample input features extracted using each method are shown in Tables 2, 3 and 4.

Classification results
In this work, 10-fold cross validation method was used. The data was randomly partitioned into 10 equal sized subsamples. One subsample was used for testing and the remaining nine subsamples were used for training. The process was repeated for a total of 10 times with each subsample taking turns to be the test sample and the mean accuracy was calculated. The classification results of ANN, ANFIS and SVM using different feature extraction methods are shown in Table 5. From Table 5, SVM has the highest overall classification accuracy. ANFIS performed better than ANN when using statistical features but is the worst when using fractal features and PCA features. ANFIS is weak when using PCA features because ANFIS requires normalizing the input data during the training process to improve its efficiency [63]. PCA component contains of a different weighting; hence normalization will change the relative significance between each components, causing higher error rate in ANFIS [68].
It can be seen that for all three classifiers, using the combination of all statistical features and fractal features rather than splitting them results in higher classification accuracy. For PCA features, all classifiers are able to achieve higher classification accuracy when PCA is performed on 6 distributions instead of 4 distributions. Therefore, for classification accuracy test using noisy signals, only the full set of statistical parameters (variance, skewness, kurtosis, mean, Weibull parameters), fractal features (fractal dimension and lacunarity) and PCA features for 6 distributions were considered. The effect of increasing feature size on the training duration for all classifiers is shown in Fig 4. From this figure, SVM has the fastest training speed, followed by ANN and ANFIS. SVM and ANN training speed is not directly affected by the size of the input feature and remains relatively consistent when the feature size is increased. ANFIS, on the other hand, experiences increased training duration when the input feature size is increased.
After the performance of ANN, ANFIS and SVM with noise-free PD data has been evaluated, the classifiers were tested using features extracted from PD data that have been overlapped with noise contamination of different durations. The classification accuracy results of these classifiers are shown in Table 6. From Table 6, it can be seen that the classification accuracy generally decreases for all classifiers and input feature combinations when more noise is added.
The plot of classification accuracy against the duration of noise contamination added to PD data for all classifiers is shown in Fig 5. From this figure, it can be seen that although statistical features and fractal features suffer from significant reduction in classification accuracy, statistical features still achieve higher classification rate than fractal features for ANN and ANFIS when the noise duration is increased. ANFIS performs slightly better with fractal features as the input was used in noise-free PD data but its performance with the statistical features is better than fractal features.
When each classifier was tested with noisy PD data, classifier with PCA features as the input data performs better than with fractal features and statistical features. Although the classification accuracy by using PCA features is not the best for noise-free PD data for all classifiers, the classification accuracy is better than other feature extractions when being tested with noisy data. This is due to the changes to the original PD data due to noise are minimized during the process of transforming the PD data from a higher dimension to a lower dimension. Thus, this causes classification accuracy using PCA features and intelligent classifiers to be less affected by different durations of noise contamination compared to statistical and fractal features. Referring to Table 6, the best classification method is PCA combined with SVM, where the highest classification accuracy is 93.6% under 5-second noise duration and 75.6% under 60-second noise duration. Comparing with previous works in [1], the accuracy of ANN reduces from 79% under noise free condition to 42.2% with 10% added noise while in [34,38], when 30% noise was introduced, the accuracy of ANN reduces from 100% to between 70 and 80%. Hence, this shows that the proposed method in this work is reasonable and has an improvement over the previous methods used for PD classification under noisy condition. This is due the classification accuracy reduction is smaller than the previous works when noise contamination was added to PD signals.

Conclusions
Classifications of real cable joint defect types from partial discharge measurement under noisy environment have been successfully performed. Feature extractions were performed on the PD data and used as the input data for artificial intelligence classifiers to classify cable joint defect types. From the classification accuracy results, feature extraction using principal component analysis (PCA) features and Artificial Neural Networks (ANN) and Support Vector Machine (SVM) classifiers show the highest classification accuracy when being tested with noisy PD data. Adaptive Neuro-Fuzzy Inference System (ANFIS) classifier is not suitable to be used with PCA features due to the design of the classifier which requires normalization during training. Classification accuracy by using feature extractions of fractal features and statistical features with the classifiers is better than using PCA features for noise-free PD data but is worse for noisy PD data. If computational time is not an important factor, it is recommended that the three input features (include statistical features, fractal features and principal component analysis) are used together to complement each other. However, if only one type of classifier and input feature is to be used in a highly noisy environment, PCA features and SVM or ANN is recommended for PD classification.