Process service quality evaluation based on Dempster-Shafer theory and support vector machine

Human involvement influences traditional service quality evaluations, which triggers an evaluation’s low accuracy, poor reliability and less impressive predictability. This paper proposes a method by employing a support vector machine (SVM) and Dempster-Shafer evidence theory to evaluate the service quality of a production process by handling a high number of input features with a low sampling data set, which is called SVMs-DS. Features that can affect production quality are extracted by a large number of sensors. Preprocessing steps such as feature simplification and normalization are reduced. Based on three individual SVM models, the basic probability assignments (BPAs) are constructed, which can help the evaluation in a qualitative and quantitative way. The process service quality evaluation results are validated by the Dempster rules; the decision threshold to resolve conflicting results is generated from three SVM models. A case study is presented to demonstrate the effectiveness of the SVMs-DS method.


Introduction
With the proposal of Made in China 2025, intelligent manufacturing has become a new research topic, and manufacturing remains the most vital component of industry. In intelligent manufacturing, the manufacturing activity is considered a type of service. How to assess the service is an urgent problem for manufacturing enterprises. There have been increasingly rapid advances in this field. Many researchers believe that process evaluation of multi-source manufacturing information is a useful measure, which is flourishing although less studied [1].
Some scholars have researched the evaluation of the process. To improve service quality, Zhang Y. [2] proposed a new three-dimensional analysis model of "service attributes, work flows and interactive contacts". With factor analysis, the system of evaluating the quality of engineering survey (SQES) was tested to demonstrate the evaluation process. A double-layer fuzzy evaluation model was proposed to evaluate engineering service quality by Zong Q. [3]. The fuzziness and randomness of service quality were taken into consideration synthetically in this model. Hrehova S. [4] showed the possibility of creating a predictive model for quality evaluation of the production process with MATLAB by using the basic tools of statistical process control and their graphic representation. Wang L. et al. [5] established a service quality gap model based on the production, management and application of the leaf area index product. A Remote Sensing Technical System of Evaluating Service Quality of Index Products (REI) on leaf area was proposed. There is limited research on service quality evaluation of the adaptive production system [6]. Moreover, most researchers focus on feature weighting for evaluation. Traditional scoring and defective deduction are widely used with intense subjectivity and tendency for quality assessment. Single feature evaluation is widespread. Some scholars tend to study rough set theory, which combines the similarity and prioritization of a single feature to evaluate production process service quality. Che H. [7] believed that the feature could be extracted in different spatial domains such as the time domain, time frequency domain, and wavelet domain. The information collected was a type of reflection of a device. Chen H. [8] combined brightness, contrast and a variety of other factors as single feature, image distortion factors to evaluate the quality of service. Shalet K. [9] extracted auto-regressive and moving average model features (ARMA) from vibration signals to improve machine performance. However, unreliable, incomplete and indistinguishable datasets can easily lead to the distortion and nonsense of the results. Therefore, the single feature method has been gradually replaced by multi-feature pattern recognition and evaluation [10].
The last few years have seen the development of multi-feature evaluation. Yang G. [11] proposed an effective multi-feature weighted multi-resolution image fusion algorithm. The fusion weights were calculated by averaging gradient feature adaptive weighting. The multi-feature (edge features and average gradient features of a low frequency coefficient and correlation signal intensity ratio of high frequency coefficient) evaluation performed well. Alabi A. [12] used workability and efficiency as the two features to evaluate the performance of the proposed Eigenface algorithm for the recognition of faces and demonstrated better performance of the Eigenface algorithm for the distinct features compared to those with plain features. Audhkhasi K. [13] extracted and demonstrated two features of both long-term and short-term distortions with a two-scale non-intrusive auditory system to evaluate speech quality with good results. Although the accuracy of multi-feature evaluation has been improved [14], there still exist quite a few problems: the complexity and dimension of feature space are relatively high [15], and fusion results, which are simply synthesis, are not precise, real-time and stable evaluation results [16].
Currently, there is increasing interest in using a SVMs-DS approach to further improve the accuracy, efficiency and stability of service quality evaluation especially in the area of fault detection. A fault diagnosis method based on the hierarchical support vector machine (HSVM) and Dempster-Shafer (D-S) theory for analog circuits was designed by Tang J. [17]. The output voltage signals from the test nodes were obtained from analog circuit test points, and faulty feature vectors were extracted from Haar wavelet transform coefficients; then, an HSVM model was built. Conflicting evidence was fused through D-S theory. Zhou C. [18] believed that noise could cause the least squares support vector machine (LS-SVM) to be less effective. More reliable modeling based on robust LS-SVM and D-S theory was proposed. The evidence dataset was developed by a distributed LS-SVM. Conflicting evidence was further fused by the D-S theory. This robust model could represent the original system well even in the presence of different types of random noise. A fault diagnosis method for a wheeled service robot driving system based on multi-principle component analysis (multi-PCA) models was suggested by Yuan X. [19], which compounds SVM and D-S evidence theory. The multi-PCA models are used to extract features from sensor data, and then the data are used in the SVM classifiers. The basic probability assignment (BPA) is determined by the integration of the confidence values.
Based on the above literature review, the existing evaluation models include single feature evaluation, multi-feature weight fuzzy evaluation and multi-feature fusion evaluation. Single feature evaluation cannot fully reflect the production system [20]; objectivity and fairness are weak in multi-feature weight fuzzy evaluation [21]; and multi feature fusion, widely used in fault detection, is another good evaluation method for service quality. Multi-feature fusion cannot make full use of all the feature information. Therefore, in this paper, a new service quality evaluation model based on SVMs-DS is proposed, which can utilize the aggregate information collected. This is particularly needed for a production process such as solar cells, a process where it is not easy to extract production quality features. Section 2 presents the method of SVM, D-S theory and the proposed service quality evaluation model based on SVMs-DS. In Section 3, a case study is submitted to demonstrate the application of the proposed model. Finally, the discussion and conclusion are provided in Sections 4 and 5, respectively.

Support vector machine (SVM)
A support vector machine (SVM) is a statistical supervised machine learning technique, used both for classification and for regression purposes [22],originally proposed by Vapnik and Cortesin 1995 [23]. The original SVM proposal is aimed at both the binary classification problem and the multi-class classification problem [22].
SVM is one of the most successful methods of machine learning, which integrates the maximum separation hyperplane decision boundary, Mercher kernel, convex quadratic function and slack variables. It is used primarily to solve nonlinear classification and regression problems. In recent years, it has gradually been applied for sample analysis, factor selection, information compression, knowledge mining, etc.
The basic principle of SVM. The decision boundary hyperplane in SVM classification is calculated by employing a training dataset. This decision boundary is completely defined by the support vectors, a subset of training input vectors, which by themselves lead to the same decision boundary [22]. The sample space is mapped to a high or infinite dimensional feature space (Hilbert space, a generalization of Euclidean space [24]) where linear learning machines can be applied to solve the problem of highly nonlinear classification and regression by using Mercer kernel. The SVM classifier is ready to be used with a different dataset than the one used in the training stage [25]. The nonlinear mathematical model for the SVM based on kernel methods is: The number of samples is l. X i R l , y i {+1,−1}, α i is the Lagrange multipliers, andC is the penalty factor. φ(X i ) is the nonlinear transform of X i .
Finally, the optimal hyperplane decision function is obtained: where K(X,Y) is the kernel function.Different inner functions in SVM can form different algorithms, and the commonly used kernel functions are as follows: • Gaussian radial basis function (RBF) where σ is a Gaussian parameter.
Each base function center of this classifier corresponds to a support vector; the base function centers with their output weights can be automatically determined by the algorithm. This is the major difference between this RBF classifier and the traditional one.
• Polynomial kernel function This is a q-order polynomial classifier.

• Sigmoid function
Here, SVM forms a multi-layer perception classifier with hidden layers. The number of hidden nodes is automatically determined by the algorithm, which does not have local minimum value trouble.
• Linear kernel function In addition to the above kernel functions, there are also other kernel functions such as exponential kernel, Laplacian kernel, multi-quadric kernel, rational quadratic kernel, and so on.
The selection of SVM parameters. The penalty factor C and parameters of SVM kernel function (such as g or σ in RBF) are all referred to as SVM parameters. The penalty factor C adjusts the confidence interval and the empirical risk ratio for machine learning within a data subspace to improve the learning machine's extension performance.
Within the data subspace, a small value of C indicates a small penalty for the empirical error and a less complex learning machine where the empirical risk can be increased. This is the 'lack of learning' case. The opposite of the upper case is recognized as the 'over learning' case. There is at least one suitable C in each data subspace available to be an optimized SVM classifier. When C exceeds a certain value, the complexity of SVM reaches the maximum allowed in the data subspace. In this case, the empirical risk and the generalization ability are almost fixed. However, there is not any uniform method to determine the optimal C.
Experiments lead by Chang C. [26] show that as C increases, the test accuracy will increase. The number of support vectors will decrease, and the number of support vectors at the boundary will decrease rapidly until there is no support vector at the boundary. Based on experiments [27], C has a great impact on training results. The optimal value of C depends on a specific condition. In general, for a change of C, the larger the amount of data used for training is, the less susceptible the training result. If the training quantity is small, a larger C value (allowed by the system) is better. Therefore, it is indicative that if sample quantities are unequal, C should be inversely proportional to a sample quantity.g is the gamma function set in the kernel function, and its default value is: That is, g is the reciprocal of feature quantity, where g = 1/2σ^2, σ is a Gaussian parameter.

D-S evidence theory
The Dempster-Shafer (D-S) theory is useful for handling uncertainty and imprecision and does not require assigning prior probabilities. D-S evidence theory is a generalization of Bayesian theory. D-S evidence theory can tell 'uncertain' and 'do not know' without prior probability, which ensures D-S evidence theory's excellent generalization performance. Based on above characteristics, D-S evidence theory is often used to address multi-source, uncertain information. Based on the recognition framework, denoted by Θ (elements in Θ are mutually exclusive), D-S evidence theory is a complete set of all possible answers to a problem. Definition 1: Let Θ = {A 1 ,A 2 ,. . .A n } be the universe; it is the recognition framework. The basic probability assignment (BPA) function m is a mapping from 2 A i to [0, 1], which consists of all possible sub-sets of Θ, including the empty set. A i is a subset of the recognition framework Θ (denoted as A i Θ), and the following is satisfied: ( In this formula, m(A i ) is called a basic probability assignment (BPA function or mass function) function of A i , and A i is a given member ofthe power set. The mass m(A i ) offers all relevant and availableevidence that supports the claim that the actual state belongs to A i . Definition 2: The Belief function (Bel) is a mapping from the set 2 A i to [0, 1]. A is a subset of the recognition framework Θ = {A 1 ,A 2 ,. . .A n } (denoted as A i Θ);the following is satisfied: When m(θ) > 0, Θ is the focal element of the function Bel: 2 A i ! [0, 1] and the kernel of the recognition function Θ.
According to D-S theory, the quantity PlðAÞ A is the negation of A, is called the plausibility of A. Generally, we have [28] BelðAÞ PðAÞ PlðAÞ From this inequality, a body of evidence about some set of hypotheses may provide a set of compatible probabilities bounded by a belief function Bel(A) and a plausibility function Pl(A) [29], even if the system information is insufficient or the system is disturbed by various kinds of stochastic noise. This means that the boundaries of the real system output can be calculated even if there is either Gaussian noise or other kinds of stochastic uncertainty in the data. In this way, the true system information can be extracted from the data in a belief function using D-S theory.
Dempster rule. In D-S theory, different independent sets of masses can be aggregated using the Dempster rule of combination. A commonly shared belief between multiple sources is extracted, and all conflicting (non-shared) beliefs are ignored [30][31]. Generally, let Bel 1 and Bel 2 be two belief functions on the same framework U; m 1 and m 2 are their basic probability assignments (BPAs), respectively, and their focal elements are Then: k is the conflict factor, which reflects the conflict degree of the evidence. Decision-making rule. If A 1 ,A 2 & U and they satisfy: If there are: 8 > < > : Then, A 1 is the decision result, where ε 1 , ε 2 , ε 3 are preset thresholds, and Θ is an uncertain set.
D-S evidence theory has several defects: it cannot address serious conflicts and complete conflicts; it is difficult for this theory to identify the fuzzy degree of synthetic evidence; the more elements in the subset there are, the fuzzier the subset; and so on. There is an army of improved algorithms proposed by domestic and overseas scholars, such as the Yager algorithm, improved Yager algorithm, weighted evidence combination method, weighted assignment conflict method, weighted absorption method based on confidence, etc. [32].

The proposed algorithm of SVMs-DS
The proposed quality evaluation requires three SVM structures with individual kernel functions for the evaluation; each SVM is independent from the others. Therefore, the different SVM may provide contradictory evaluation results. The proposed method employs the D-S theory to combine the contradictory accuracy (the BPAs) in the testing phase; the conflicting results generated in the testing phase can be refined. The three kernel-function-SVM-structures are one of the innovations of this paper. The proposed algorithm of SVMs-DS is shown in Fig 1. Input: the feature set. Output: the fusion result of service quality evaluation. STEP1: The data set is constructed from the information collected from sensors. STEP2: The feature set is generated after the normalization and dimensionality reduction.
STEP3: Trained by certain kernel function. The results of preprocessing are randomly selected as the training set, and the remaining are the prediction set. It is necessary to set the parameters or to optimize the parameters in the SVM before training. The feature space can be classified into multiple hyperplane decision boundaries of different spaces after the SVM training. There are 3 different kernel functions (RBF, Linear, and Sigmoid) for training. The results are labeled SVM1/SVM2/SVM3. STEP4: Test the prediction set by the SVM1/SVM2/SVM3 and obtain the basic probability assignments (BPAs). Voting mechanisms are put in place for BPAs.
STEP5: Fuse the BPAs by D-S theory. The BPAs will serve as the independent evidence of D-S theory for input. After calculation, the conflict factors K and mass function, the fusion result of service quality evaluation, will be assessed by the decision threshold. The service quality evaluation is issued.

Case study
Preprocessing of the feature set The data set (STEP1). The original data are collected by sensors and other measurements. Therefore, the first step is to install a large number of sensors. Some key sensors are shown in Fig 2. In Fig 2, there are some key sensors in the production line. MPXH6300/MPXV5100/ MPXV5500/MPXH6400 are the pressure sensors used to estimate the clamp force/cutting force of air. Their differences are ranges and accuracies. EVT-20PH and E3FA-DN11 are utilized to count the number of solar cells. SSA00XXH2-485 determines the deflection angles, and JCJ100ZB measures temperature. The CCD is edge detection. The gas/hydraulic sensors are all used to monitor the liquid level position indicated in Fig 3. The solar string welder production process can be divided into several systems: feeding system, loading system, CCD vision system, manipulator transfer system, release system, pressure belt system, cutting mechanism system, welding platform systems, photovoltaic welding systems, film system, lateral transfer system, flip mechanism and so on. In total, there are 309 features collected as shown in Table 1.  Feature set (STEP2). Eliminate the singular features, such as major equipment failure and solar string welder switches, for the above 309 features by data screening based on rules. After feature elimination, a feature set containing 86 values is constructed as shown in Table 2.
Principal component analysis (PCA) is used to lower the dimension of the86 features to simplify the calculation. The main principles of PCA are as follows: 8 > > > > < > > > > : x i denotes the original eigenvector, F i denotes the eigenvector analyzed by PCA, and a ij is a dimension reduction matrix. While 86 features are dimensionally reduced to 57, feature vectors can be normalized into [0, 1].
x 0 stands for a single vector value, while x 0 min is its minimum value, and x 0 max is its maximum value. x is the normalized result of x 0 . If x 0 ¼ x 0 min , then x = 0; if x 0 ¼ x 0 max , then x = 1. Results are shown in Table 3. Table 1. The features of the production process.

Production preparation process
Belt/Flux/Air pressure battery separator air blowing/servo reset/reset/fault detection, etc. 43 items in total.

Feeding system
The main grid direction of cell/the number of feed/step forward or backward/ step specifications, etc. 13 items in total.
Loading system Status of suction fan/Suction position of cell discharge area/separation air knife/ cylinder waiting position/grab position, etc. 18 items in total.

CCD vision system
Number of NG pieces/battery plate specifications/cell edge detection/silk screen detection/corner detection/grid line detection/calibration of the standard features, etc. 66 items in total.

Manipulator transfer system
Reset button/ROB connection status/fetch to CCD platform/battery box to CCD/ battery box to the adjust platform/the adjust platform to the CCD camera features, etc. 20 items in total.
Release system Continuous welding traction/tension tightened state, etc. 8 items in total.

Pressure belt system
Fault status/welding band bending/bending cylinder, etc. 7 items in total.

Cutting mechanism system
Tape length/tail-tail empty state and length/number of cells/cut-and-hold state, etc. 18 items in total.
Welding platform system Insulation state and insulation temperature/cooling state and cooling temperature, etc. 15 items in total.

Photovoltaic welding systems
Real-time production/welding temperature/traction jaw/welding time/conveyor speed, etc. 25 items in total.
Film system Film material selection/film to tighten/film transfer mechanism state/floor temperature conditions, etc. 12 items in total. Train the set by the RBF/linear/Sigmoid function(STEP3) Since there is a total of 196 samples for the process service quality evaluation,150 samples are randomly selected as the training set to synthesize the model, and the remaining 46 samples form the prediction set to test the model. The data can be found in S1 Dataset. Set parameter C = 1.828 and g = 0.1 (calculated by LibSVM, c 1 = 1.5 and c 2 = 1.7 are the optimized parameters deduced by 100 iterations;software available at http://www.csie.ntu.edu. tw/~cjlin/libsvm).
First, non-linear dataset p{label: feature set} maps the space sample into a high/infinite dimension feature space (Hibert space), which transforms the non-linear problem in the original sample space into a linear problem in the feature space. Then, hyperplane decision boundary models are built by training those 150 samples using SVM based on RBF, Linear and Sigmoid kernel functions.   Test the prediction set (STEP4) The remaining 46 samples are the prediction set. Test the hyperplane decision boundary models built above by using the test set. The test result, which used SVM based on RBF, shows that 3 samples have calculation inconformity to the evaluation given by the enterprise of ATW. SVM based on Linear kernel have2 and SVM based on Sigmoid kernel have 4. Table 4 shows the test results.
As is shown in Table 4, the label of the first sample is 5, which means that the daily production service is evaluated as level 5 by an engineer. There are 5 different levels in the ATW to show different service quality. Hyperplane decision boundary deduced by SVM based on Linear kernel (denoted as SVM2) is evaluated as level 4. Hyperplane decision boundaries deduced by SVM based on RBF and Sigmoid kernel are evaluated as level 5. Among them, the BPA results of all tests are shown in Table 5.

Recognition framework and the results (STEP5)
BPAs by different kernel functions in each sample do not depend on each other and can serve as inputs to D-S evidence theory [33]. Before fusion, a recognition framework should be built: the BPA of the ATW's evaluation level is one subset of the recognition framework (denoted as ATWBPA, m(A 1 )); the BPA of the prediction level is another subset of the recognition framework (denoted as PPEBPA, m(A 2 )); and the sum of the other levels' BPAs is the last subset, uncertain information, m(Θ).This study uses the first sample in Table 4 as an example. The first sample's ATW level/RBF SVM/Sigmoid SVM is level 5, and Linear SVM is level 4. They clash with each other. The BPA of level 5 of each kernel function SVM is the ATWBPA (m(A 1 )); the BPA of level 4 is the PREBPA (m(A 2 )), and the m(Θ),which is recognized as uncertain information, is the sum of the other three levels, as shown in Table 6.
According to the Dempster rule, the RBF data/Linear data/Sigmoid data will take two steps to obtain the final decision. The first step is to fuse the RBF data and Linear data and calculate the first fusion data (intermediate data). The second step is to fuse the first fusion data with the Sigmoid data.
The conflict factor k of the first step is calculated by Eq (10): Â PREBPAmðA 2 Þ Linear þ mðYÞ RBF Â mðYÞ Linear ¼ 0:3177 The BPA of the intermediate data is calculated by Eq (11):  Table 7.
The second step of the fusion uses the same method as the first step; the fusion results are shown in Table 8.
All the contradictory results in Table 4 are fused, and the fusion results are shown in Table 9.
As shown in Table 9, using the first sample as an example, we obtain the following conclusions.
1) The fusion level is 5, which is equal to the ATW level.
2) In this paper, according to statistics data, the thresholds are set as ε 1 = 0.34; ε 2 = 0.06; ε 3 = 0.6. ε 1 stands for the difference between m(A 1 ) and m(A 2 ). The minimum ε 1 is one-third; the larger the difference between the two contradictory results is, the higher the credibility.  The mass function of the ATW (m(A 1 )) is 0.6880, and the predicted result (m(A 2 )) is 0.2993, which satisfies the conditions m( Based on the above conclusions, the model is supported by the fusion result. In Table 10, 7/ 8 samples passed the threshold test.

Discussion
The proposed process service quality evaluation method employs the D-S theory to combine the BPAs of each SVM. The prediction results can be refined, and the prediction accuracy SVMs-DS applied in process service quality evaluation increases as shown in Table 10.The total numbers of support vectors deduced from SVM based on RBF, Linear and Sigmoid data are 87, 90 and 117, respectively, without "over learning". Compared with the prediction accuracy of the RBF (93.4%)/Linear (95.6%)/Sigmoid (91.3%) data, the prediction accuracy of the proposed D-S fusion (97.8%) is higher because of using all features and fusing the three different kernel functions to evaluate the quality of service.
The advantages of the proposed method are: • Compared with the traditional SVM-DS, the application domain of this paper is new pattern recognition(process service quality evaluation) instead of fault diagnosis. The proposed evaluation method utilizes three different kernel functions to fuse; • Compared with the single feature evaluation, the proposed method has higher accuracy, smaller fluctuation and stronger stability; • Compared with the fuzzy weight analysis method, the proposed method is more objectivein machine learning; and • Compared with the multi-feature evaluation, the proposed approach makes full use of all the features.

Conclusions
This paper proposes a SVMs-DS method to evaluate service quality. The BPAs are put into the D-S method as independent evidence. Then, the Dempster rule and thresholds are introduced to the algorithm. The SVMs-DS is found to be superior and more accurate for process service quality evaluation. This method combines three SVMs in parallel for service quality evaluation based on kernel functions. Take each SVM as independent evidence; the evaluation results can be obtained by combining the three individual SVM results and making the ultimate decision by employing D-S evidence theory. The experimental results show that SVM-DS elevates the accuracy and stability of the service quality evaluation, which can obtain 97% prediction accuracy.
Although SVMs-DS can obtain test success of 97.8% by considering suitable sigma and penalty factors, the machine dataset is the only information resource. As a system, the information of operators such as muscle fatigue and brain fatigue should also be considered when evaluating service quality. The collection, extraction and fusion of these physiological data based on neural industrial engineering represent an important direction to devote future research efforts.
Supporting information S1 Dataset. The dataset of the process service quality evaluation. (MAT)