Monitoring indexes of concrete dam based on correlation and discreteness of multi-point displacements

Monitoring indexes are significant for real-time monitoring of dam performance in ensuring safe and normal operation. Traditional methods for establishing monitoring indexes are mostly focused on single point displacements, and rational monitoring indexes based on multi-point displacements are rare. This study establishes monitoring indexes based on correlation and discreteness of multi-point displacements. The proposed method is applicable when several monitoring points show strong correlation. In this study, principal component analysis (PCA) was introduced for preprocessing the observations of multi-point displacements. Correlation and discreteness of multi-point displacements were extracted and constructed. The correlation and discreteness parts described the integral and local variance of the displacement field. On this basis, the annual maximum values of the correlation and discreteness parts were selected and their probability density functions (PDF) could be generated by employing the principle of maximum entropy. PDF was constructed using maximum entropy method and was least subjective because it barely provided the moment information of the observations. The multi-point monitoring indexes were then determined by the typical low probability method based on the obtained PDFs. Finally, the proposed method was analyzed using a practical engineering and was verified in terms of its feasibility.


Introduction
Hydraulic structure is an indispensable infrastructure in society because of its comprehensive functions and benefits in flood control, irrigation, and power generation [1,2]. Safety monitoring indexes based on prototype observations are frequently adopted in hydraulic structures [3][4][5][6][7], and displacement is one of the major monitored items for dam safety [8]. As the monitoring items and data are becoming various and multitudinous, much more attention has been paid to monitoring indexes considering multiple monitoring variables [9][10][11]. A concrete dam is a dynamic system exposed to influences of various non-deterministic settings, such as environmental variances, hydraulic loads and geological factors. These non-deterministic factors a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 affect the overall displacement of the concrete dam versatilely [12,13]. Therefore, monitoring indexes based on the interrelationship of multi-point displacements must be developed.
Various methods for establishing monitoring indexes based on multi-point displacements have been proposed in several references [14][15][16]. Huang et al. [16] investigated the reasonable form of presetting factors of two-dimensional (2D) space mathematical model with multiple survey points, and they established a statistical model for the Danjiangkou Dam. Yu et al. [15] adopted the principal component analysis (PCA) in multivariate analysis of dam monitoring data and they established the overall control region for dam safety monitoring. Yang et al. [14] determined multi-stage warning indicators for the overall deformation of concrete dam considering fuzziness and randomness, and they achieved nondeterministic optimal control.
Monitoring indexes based on multi-point displacements must be determined based on the correlation of the observations. In the present study, PCA [17][18][19][20] was introduced to reduce the dimension of observations and obtain the correlation of the observed displacements. PCA is a popular variable reduction technique [8,9] and converts a p-vector of the observed displacements into a p-vector of principal components. The correlation and discreteness of multi-point displacements were also extracted and established with PCA in this study.
Thereafter, maximum entropy method (MEM) was performed to generate the probability density functions (PDFs) of the correlation and discreteness parts. According to the principle of maximum entropy, MEM selects the PDF with least subjectivity, thereby maximizing the entropy subject to the moment constraints. MEM has been used to achieve the probability distribution in many fields [21][22][23][24][25]. The monitoring indexes of the correlation and discreteness parts could then be decided using the typical low probability method.
The rest of the paper is organized as follows. Section 1 introduces PCA for extracting the principal components of multi-point displacements. The correlation and discreteness parts are also obtained in this section. Section 2 outlines the MEM for generating the PDFs of the annual extreme values of the correlation and discreteness parts. Section 3 demonstrates how the monitoring indexes of the correlation and discreteness parts can be decided using the typical low probability method based on PDFs. Finally, Section 4 analyzes a numerical example for verifying the feasibility of the proposed method.

PCA for correlation and discreteness of multi-point displacements
False alters, data redundancy and noise effect [15] are the three key problems in dam safety monitoring based on prototype observations. PCA is effective in reducing the frequency of false alarms in consideration of the correlation between numerous observed items, extracting the principal components, realizing data reduction, and reducing the noise effect on the data analysis. Therefore, PCA was introduced in this study in processing the observed multi-point displacements and extracting the correlation and discreteness parts.

Rationale of PCA for multi-point displacements
For m monitoring points of dam displacement with certain correlation, m simultaneous observations were selected to form the matrix of observed multi-displacement as follows: . . . PCA replaces the displacements at m monitoring points with k principal components, u 1 , u 2 ,Á Á Á,u k (k < m), which contain the vast majority of the information of the original observations. k principal components are the linear combinations of the observed displacements, which can be expressed as follows: where l ji are the coefficients of the linear combinations. The principal components must satisfy the following requirements: 1. u i and u j (i 6 ¼ j; i,j = 1,2,Á Á Á,k) are independent, i.e. u i u T j ¼ 0; 2. u i has the maximum variance among all the linear combinations of x 1 ,x 2 ,Á Á Á,x m which are independent of u 1 ,u 2 ,Á Á Á,u i−1 .
Based on the aforementioned analysis, the PCA for multi-point displacements determines the linear combinations coefficients l ji , and they are the eigenvectors corresponding to the first k largest eigenvalues of the correlation matrix of the multi-point displacements. A two dimensional example was presented as follows to illustrate the mathematical meaning of PCA. P 1 and P 2 are two observation points with certain correlation. Fig 1 shows the observations of the two points over a certain time period, and Fig 2 illustrates the two principal directions of these two-point observations. Given the correlation between the two observation points, these observations distribute near a straight line and roughly form an ellipse. The coordinate system is then rotated with a certain angle θ in 2D space. The major and minor axes of the ellipse are denoted as u 1 and u 2 , respectively. The formula of the rotation is presented as follows: ( where j = 1,2,Á Á Á,N, and N is the times of the observations. The rotation formula can also be expressed in a matrix form as follows.
where U is the coordination rotation matrix and U ¼ cosy siny À siny cosy " # , such that U T = U −1 , UU T = I. As shown in Fig 1, the major variance of these observations reflects on u 1 axis after the rotation. If the distribution of the multi-point displacements is expressed in one dimension, then u 1 is the best direction that can guarantee the minimum loss of the original observations. The information of the displacement observations on u 1 is regarded as first principal component, which presents the major correlation of the multi-point displacements. Meanwhile, the information on u 2 is regarded as second principal component. When correlation exists between m observation points, the multi-point displacements can be separated in m directions in order of importance with orthogonal transformation in multi dimension. The specific steps of PCA for multi-point displacements are presented in the next section.

Steps of PCA for multi-point displacements
The steps of PCA for the correlation and discreteness parts of multi-point displacements are as follows: Step 1. The matrix of multi-point displacement observations was normalized through the following formula: where " The normalized matrix was then obtained as follows: Step 2. The correlation matrix R could be generated through the following formula: Step 3. The secular equation of the correlation matrix R was presented as follows: The solutions of its secular equation were m eigenvalues λ 1 , Step 4. m components of multi-point displacements were calculated through the following formula: where V T = [ν 1 ν 2 Á Á Á ν m ], and ν i is the real eigenvector corresponding to the eigenvalue λ i .
Step 5. The number of the principal components k could be determined by calculating the cumulative percent variance (CPV) through the following formula: The threshold of CPV, c, should be predetermined. When CPV(k)!c, the first k components were the principal components of multi-point displacements. c should reach 70%-90% [26].
Based on the aforementioned analysis, m principal components of multi-point displacements could be separated into two parts through the following formula: where U p includes the vast majority information of the multi-point displacements; this information shows strong correlation and corresponds to the information on u 1 in Fig 2; meanwhile, U e is the discrete or even uncorrelated information of the multi-point displacements; this information corresponds to the information on u 2 in Fig 2.

Analysis of U p and U e
The monitoring indexes in this study were determined based on correlation and discreteness of multi-point displacements, corresponding to U p and U e .
When U e changes obviously and U p remains stable, the discreteness of the multi-point displacements increases and the correlation decreases. The variance of the displacement observations on u 2 also increases. This finding corresponds to the changing process from condition (a) to (c) in 2D space in Fig 3. The multi-point displacements show parallel or consistent variation trend if all the included observation points were analyzed as a whole. The sudden drop in correlation of the displacements observations can illustrate that several individual observations locations were interfered by unknown factors. Therefore, the local abnormality of the multiple observations can be detected if all the observation points are guaranteed to be in proper working order.
When U p changes obviously and U e remains stable, the correlation and discreteness parts of multi-point displacements remain stable. Meanwhile, the average value of the displacement observations on u 1 changes significantly. This finding corresponds to the changing process from condition (a) to (b) in 2D space in Fig 3. The integral and consistent change of the multipoint displacements can illustrate that the displacement field in this part changes integrally.
Based on the aforementioned analysis, the monitoring indexes of U p and U e can be determined for monitoring the integral and local variation of the multi-point displacements according to their PDFs using the typical low probability method. MEM was introduced to approximate the PDFs of U p and U e in the next section.

Maximum entropy principle (MEP)
The Shannon entropy of discrete random variables was defined by the following formula: where p i is the probability of variable x i (i = 1,2,Á Á Á,n); HðxÞ is the information entropy, which expresses the uncertainty of a stochastic system. For continuous probability distributions, the Shannon entropy can be defined by the following formula: where f(x) is the PDF of the probability distributions, and Formulas (12) and (13) indicate messages in two conditions. On the one hand, the information entropy can be calculated with the two formulas if the probability p i or the PDF f(x) is already known; on the other hand, H(x) can be regarded as the functional of the probability p i or the PDF f(x). H(x) changes with p i or f(x) and the probability distribution function can be decided with H(x). According to the probability distribution statistical inference rules proposed by Jaynes [27], the probability distribution maximizing the information entropy should be selected when based on partial message. Maximum entropy implies that the contrived hypotheses caused by data deficiencies are the minimum and the obtained solution is the most unaffected and objective. MEM is advantageous because of its simple and rapid calculation.

Maximum entropy probability density function
According to the principle of maximum entropy, the probability distribution with minimum deviation maximizes the information entropy H(x) subject to certain constraints based on the known samples, which is expressed as follows: subject to Z where R is the domain of integration; μ i (i = 1,2,Á Á Á,N) is the i order origin moment, which can be obtained from the samples. The first four order origin moments are sufficient in describing the main characteristics of the random variables in many studies [22]. Thus, they were adopted in the present case study. The first origin moment describes the center of the random variables; the second origin moment describes the discreteness of the random variables around the average value; the third origin moment describes the symmetry of the random variables; and the fourth origin moment describes the centralization and decentralization degree of the random variables.
The Lagrange multiplier method was applied to solve this problem and the corresponding Lagrange function was established as follows: According to stationary value theory (@L/@f(x) = 0), Formula (17) can be transformed into the following expression: The analytical form of maximum entropy PDF f(x) is presented as follows: Formula (19) shows that the solution of maximum entropy PDF can be obtained by determining the Lagrange multipliers (λ 0 ,λ 1 ,Á Á Á,λ N ).
Substituting Formula (19) into Formula (15) leads to the following formulas: Substituting Formulas (19) and (22) into Formula (16) leads to the following formula: For a more convenient numerical calculation, Formula (23) can be transformed into the following expression: where r i are the residuals that can approach zero using a numerical technique. A solution of the Lagrange multipliers (λ 0 ,λ 1 ,Á Á Á,λ N ) can be generated by non-linear programing to obtain the minimum of the sum of the squared residuals r i .
Convergence is achieved when r < ε or |r i | < ε, where ε is the specified acceptable error.
Formula (22) is used to obtain λ 0 with the substitution of the obtained Lagrange multipliers (λ 1 ,λ 2 ,Á Á Á,λ N ). Eventually, the maximum entropy PDF f(x) can be determined using Formula (19). Given that the integration above is difficult to solve by analytical methods, a numerical integration is needed. The domain of integration R should be preprocessed and the upper and lower bounds of the PDFs should be assumed in advance. Considering that the PDF f(x) of random variables x is generally characterized by its thin tails, the integral value of f(x) in the infinite domain (−1,+1) can be approximated by that in the finite domain (a,b). The residual error should be reduced to be acceptable if the finite domain (a,b) is sufficiently wide. For example, the residual error is 0.27% if (a,b) was set as (μ − 3σ,μ + 3σ) for random variables x obeying normal distribution. (a,b) was set as (μ − 5σ,μ + 5σ) conservatively in the following numerical example considering that the practical probability distribution may be skewed.
The PDFs of the annual maximum and minimum values of U p and U e can be approximated using MEM. The monitoring indexes are determined in the next section.

Monitoring indexes of correlation and discreteness
Based on the PDFs of the annual maximum and minimum values of U p and U e , the monitoring indexes ðd p min ; d p max Þ and ðd e min ; d e max Þ can be determined given certain significance level α [28,29]. When U p , the correlation part of observed multi-point displacements, exceeds ðd p min ; d p max Þ, the integral displacement of the dam is in a warning situation. Therefore, forewarning measures must be adopted against probable danger. When U e , the discreteness part of observed multi-point displacements, exceeds ðd e min ; d e max Þ, local abnormality may occur in the observed region. Thus, timely inspection must be adopted to guarantee the safety and forewarning measures must be taken.
The corresponding exceedance probability and monitoring indexes ðd p min ; d p max Þ and ðd e min ; d e max Þ can be derived with the following formula: where f min (δ) and f max (δ) are the PDFs of the annual maximum and minimum values of U p and U e . The flowchart for the determination of the multi-point displacement monitoring indexes of the correlation and discreteness parts is presented in Fig 4.

Numerical example
The concrete gravity arch dam with variable radii located in the upstream of Qingyi River, a tributary of Yangzi River, is an important part of a comprehensive middle-sized hydropower project. The dam was constructed from August 1958 and was completed in 12 years. The elevation of the dam crest is 126.  observations of the multi-point radial displacements and water level are shown in Fig 6. The radial horizontal displacement is positive when it moving towards downstream and negative when it moving towards the upstream.
In this study, the two-stage monitoring indexes of correlation and discreteness were determined according to the practical operation of the project. The significance level α was assumed as 0.05 for the primary monitoring indexes and 0.01 for the secondary monitoring indexes. The reliability coefficients of the monitoring indexes were 95% and 99%, respectively.

Extraction of correlation and discreteness parts using PCA
PCA was conducted with the observations of the six selected observation points, and the variance contribution percent of the six extracted components are shown in Fig 7. The eigenvalues, variance contribution percent and CPVs are listed in Table 1. The threshold of CPV, c, was set as 90% in this study. The information included in the first two components reached 91.37%, exceeding 90%. Therefore, the correlation part U p is the sum of the first two components and the discreteness part U e is the sum of the last four components, which were expressed as follows:

Generation of PDFs of U p and U e using MEM
In determining the monitoring indexes ðd p min ; d p max Þ and ðd e min ; d e max Þ, the annual extreme values of U p and U e from 1972 to 2012 were selected to generate PDFs. The first four moments for the annual extreme values of U p and U e are listed in Table 2.
The first four moments for the annual extreme values of U p and U e were substituted into Formula (24), and the non-linear least square method was performed to obtain the solution of the five Lagrange multipliers (λ 1 ,λ 2 ,λ 3 ,λ 4 ). λ 0 can then be solved by substituting Lagrange multipliers (λ 1 ,λ 2 ,λ 3 ,λ 4 ) into Formula (22). Table 3 presents the solutions for these Lagrange multipliers.

Determination of the monitoring indexes of U p and U e
In the case of the correlation part U p , if α = 5%, then the primary warning monitoring index is (-12.52,7.52); if α = 1%, then the secondary warning monitoring index is (-13.37,8.51). In the case of the discreteness part U e , if α = 5%, then the primary warning monitoring index is (-8.30,-2.07); if α = 1%, then the secondary warning monitoring index is (-9.99,-1.77). Table 4 presents the two-stage monitoring indexes of U p and U e decided using the proposed method and the KS method. The probability density curves of the two methods are shown in Fig 9. The results obtained by the proposed method were close to those by the KS method, and the probability density curves of these two methods were relatively consistent. However, the probability distributions approximated by the KS method were derived from the reference distributions and had high subjectivity. MEM generated the probability distributions merely considering the moment information of the multi-point displacement observations and was thus more objective. Therefore, the monitoring indexes of the proposed method were more rational and recommended to be used for the practical monitoring and operation.

Conclusions
This paper presents a method for establishing monitoring indexes of correlation and discreteness of multi-point displacements for concrete dam using PCA and MEM. The correlation and discreteness parts of multi-point displacements were extracted and constructed using PCA, which can describe the integral and local variation trend of the dam displacement. The PDFs of the two parts were approximated by MEM, which is an effective approach to establish a probability density distribution with least subjectivity given a finite number of moments. The monitoring indexes of the two parts could be determined given a certain significance level. The feasibility of the proposed method was demonstrated by a numerical example. The numerical results show that the proposed method could determine rational and accurate multi-point monitoring indexes for concrete dam displacement. A comparison of the results from the proposed method and the KS method confirms the accuracy of the proposed method. The monitoring indexes determined by the proposed method were recommended because of their least subjectivity. However, the accuracy of the monitoring indexes obtained by the proposed method depends on the observation data under most unfavorable load combination, and a structural mechanic analysis is ignored in this study. Further research will be directed to improve the insufficiency and make the monitoring indexes highly rational.