Asymptotic Properties of Pearson's Rank-Variate Correlation Coefficient under Contaminated Gaussian Model

This paper investigates the robustness properties of Pearson's rank-variate correlation coefficient (PRVCC) in scenarios where one channel is corrupted by impulsive noise and the other is impulsive noise-free. As shown in our previous work, these scenarios that frequently encountered in radar and/or sonar, can be well emulated by a particular bivariate contaminated Gaussian model (CGM). Under this CGM, we establish the asymptotic closed forms of the expectation and variance of PRVCC by means of the well known Delta method. To gain a deeper understanding, we also compare PRVCC with two other classical correlation coefficients, i.e., Spearman's rho (SR) and Kendall's tau (KT), in terms of the root mean squared error (RMSE). Monte Carlo simulations not only verify our theoretical findings, but also reveal the advantage of PRVCC by an example of estimating the time delay in the particular impulsive noise environment.


Introduction
Correlation coefficients are indices that depict the strength of statistical relationship between two random variables obeying a joint probability distribution [1]. In general, correlation coefficients should be large and positive if there is a high probability that large (small) values of one variable occur in conjunction with large (small) values of another; and it should be large and negative if the direction reverses [2]. Due to their theoretical and algorithmic advantages, correlation coefficients have been widely used in many sub-areas of signal processing [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. Among many methods of correlation analysis in practice, Pearson's product moment correlation coefficient (PPMCC), Kendall's tau (KT) and Spearman's rho (SR) are perhaps the most prevalent ones [19].
There are many advantages and disadvantages to these three classical coefficients. PPMCC is optimal under the bivariate normal model (BNM), and is appropriate mainly for characterizing linear correlations. However, it will output misleading results if nonlinearity is involved in the data. On the other hand, the two rank-based coefficients, SR and KT, are not as powerful as PPMCC when the data follows bivariate normal distributions. Nevertheless, they are invariant under increasing monotone transformations, which makes them more suitable for many nonlinear cases in practice. Moreover, theoretical and empirical results indicate that SR and KT surpass PPMCC when data are corrupted by impulsive noise [12,20]. Besides these three classical coefficients, some methods such as Pearson's rank-variate correlation coefficient (PRVCC) [21], Gini correlation (GC) [22], and order statistics correlation coefficient (OSCC) [23][24][25] among others [26][27][28][29], have also been proposed in the literature.
It was known long before that PPMCC is an optimal estimator of the population correlation coefficient, in the sense of unbiasedness and approaching the Cramer-Rao lower bound for large samples under bivariate normal models [30]. Despite its desired properties just mentioned, PPMCC might not be applicable under the circumstances where the data are corrupted by impulsive noise, that is, the distribution of data deviates from the BNM. Consider the following scenario that frequently occurs in radar, sonar or communication. We have a prescribed signal, whether deterministic or stochastic, whose statistical property is priorly known. Our purpose is to estimate the correlation between this ''clean'' signal and the associated distorted version from the receiver that might be corrupted by a tiny fraction of impulsive noise (outliers with very large variance [31][32][33]). To deal with such case, one might adopt the conventional strategy, that is, ranking the cardinal variable(s) and resorting afterwards to SR or KT [22], which are robust against both nonlinearity and impulsive noise [20]. However, using only ranks of the two variables, we unavoidably lose useful information embedded in the variates of the ''clean'' variable. A better strategy would be to rely on coefficients, such as PRVCC [21], that accommodate both ordinal and cardinal information contained in the samples. Our purpose in this work is thus to investigate the properties of the historical PRVCC by both theoretical and empirical means.
The contribution in this work is twofold. Firstly, we establish the asymptotic closed forms of the expectation and variance of PRVCC under a contaminated Gaussian model that emulates a frequently encountered scenario in practice. Secondly, we demonstrate the superiority of PRVCC over PPMCC, SR and KT, in terms of the root mean squared error, by an example of estimating the time delay in the particular impulsive noise environment. These theoretical and empirical findings might be helpful to rejuvenate the historical PRVCC, which has long been forgotten in the literature due to insufficient understanding on its theoretical properties.
For convenience of later discussion, we employ symbols E( : ), V( : ), C( : , : ) and corr( : , : ) to denote the mean, variance, covariance and correlation of (between) random variables, respectively. Univariate and bivariate normal distributions are denoted by N (m,s 2 ) and N (m 1 ,m 2 ,s 2 1 ,s 2 2 ,r), respectively. The sign^reads ''is approximately equal to'', whereas the sign ¼ D stands for ''is defined as''. The notation u(t)~O(v(t)) denotes that u(t)=v(t)?0 as t?L(might be infinite) [34]. The symbol n ½m stands for the product n(n{1) Á Á Á (n{mz1). Other notations will be defined where it first enters the text.

Methods
This section presents the definitions of PRVCC as well as a particular CGM model simulating the impulsive noise environment mentioned in the previous section. Moreover, some auxiliary results are also established for further theoretical analysis.

Definitions of PRVCC
Let X and Y be two random variables following a continuous bivariate distribution. Denote by F X and F Y the marginal distributions of X and Y , respectively. Then, according to a historical paper of Pearson [21], one of the population versions of PRVCC can be defined, in modern notation, by Exchanging the roles of X and Y in (1) yields the other version Let f(X i ,Y i )g n i~1 be n independent and identically distributed (i.i.d.) data pairs drawn from a continuous bivariate population. After rearranging fX i g n i~1 in ascending order, we get a new sequence X (1) v Á Á Á vX (n) , which is termed the order statistics of X [35][36][37]. Suppose that X j is at the kth position in the sorted sequence. The integer k[½1,n is named the rank of X j and is denoted by P j [fP j g n j~1 . Let j j represent the arithmetic mean of n data points fj i g n i~1 . Then, based on (1), the sample version with respect to r P H (Y ,X ) can be constructed as where fF F X (X i )g n i~1 is the empirical cdf of fX i g n i~1 . Substituting the relationshipF F X (X i )~P i =n into (2) along with some simplifications leads to Exchanging the role of X and Y in (3) gives another sample version r H (X i ,Y i ) with respect to r P H (X ,Y ). Note that in general r H (X ,Y )=r H (Y ,X ). The choice between r H (X i ,Y i ) and r H (Y i ,X i ) depends on different roles played by X and Y in the scenario mentioned in the previous section. To avoid redundancy, we will focus on the properties of r H (Y i ,X i ) which is abbreviated as r H in the sequel unless ambiguity occurs.

Contaminated Gaussian Model
To simulate the specific circumstance remarked in Section Introduction, throughout we utilize the following CGM representing the joint probability density function (pdf) of two random variables X and Y [20] (1{e)N (m x ,m y ,s 2 x ,s 2 y ,r)zeN (m' x ,m' y ,s' 2 where 0ve%1, m x~m ' x~my~m ' y , s' y~sy , s' x &s x and s' x &s y . Under this specific CGM, it is obvious that the marginal distribution of Y is N (m y ,s 2 y ), whereas the marginal distribution of X is (1{e)N (m x ,s 2 x )zeN (m x ,s' 2 x ). In other words, under Model (4), Y stands for a ''clean'' normal variable while X stands for a ''dirty'' variable corrupted by a tiny fraction of Gaussian component with vary large variance (might tending to infinity). In this model, the parameter r, which is considered of interest, is what we aim at estimating as accurate as possible, while r' and e are interferences we seek to suppress. For the reason why Model (4) can happen in practice, please see Appendix A in our previous work [20].

Auxiliary Results
To establish our major results of Theorem 1 in the next subsection, some auxiliary results summarized in Lemma 1 below are mandatory.
In the second column n and Table 2. Quantities for Evaluating A~P n i,j,k,l EfH( In the third column n  Table 3. Quantities for Evaluation of K in Eq. (45).

Asymptotic Mean and Variance of PRVCC Under CGM (4)
By applying the delta method [38] with Lemma 1, we are ready to establish the closed forms of the mean and variance of PRVCC for samples generated by CGM (4).
Proof. From (3), it is easy to verify that r H is shift invariant. Therefore, we lose no generality by assuming that m x~my~0 hereafter. For convenience, write U~1 n(n{1) Then, by the well known delta method [38], it follows that and V(r H )^3 n nz1 Obviously, we only need to evaluate E(U), E(V ), V(U), V(V ) and C(U,V ) in order to work out (25) and (26). From the theorem assumption, both fY j g n 1 j~1 and fY' j g n 2 j'~1 follow the same normal distribution N (0,s 2 y ), then (n{1)V =s 2 y obeys a x 2 n{1 distribution, which means that Using Lemma 1 with some tedious algebra, we can obtain E(U), V(U) and C(U,V ). We first derive E(U). From the definition (23) and the relationships P i~P n j~1 H(X i {X j )z1 [40], Table 4. Cont.
Expanding and recalling that E(Y )~0, it follows that Since, by definition, f(X i ,Y i )g n i~1 is a mixture of f(X j ,Y j )g n 1 j~1 and f(X 0 j' ,Y 0 j' )g n 2 j'~1 , (30) can be expanded as which becomes (32) after some straightforward algebra along with the assistance of (8) in Lemma 1.

E(U)~s
Next we evaluate V(U), which can be written as where and The expression of D is easily obtained by substituting (31) into (34). For convenience, denote by B 1 and B 2 the two triple summations of B in (36). Then it follows that B 1 is decomposable into eight sub-triple summations which can be further partitioned into 16 disjoint and exhaustive subsets that listed in Table 1. An application of (7) to Table 1  Thus B~B 1 zB 2~n (n{1)s 2 y : The quadruple summation in (37) can be decomposed as Expanding (38) according to different suffixes of (i j k l) and (i' j' k' l'), we obtain 16 sub-quadruple summations which can be further partitioned into 56 disjoint and exhaustive subsets. In other words, A is a summation of 56 integrals of the form EfH(W 1 )H(W 2 )W 3 W 4 g, i.e., the I ' -terms, weighted by corresponding subset cardinality, i.e., the a ' -terms. By substituting into (5) the corresponding parameters tabulated in Table 2 as well as exploiting the symmetry of (38), we obtain the expression of A.
Substituting the expressions of A, B, C and D into (33) and tidying up lead to the expression of V(U) in (39).
Finally we deal with C(U,V ). Write and Then U~U 1 zU 2 . Now Since we have E(U) in (32) and E(V ) in (27), the second term in (42) can be easily obtained, as whereas the first term, E(U 1 V ), can be written as where : can be regarded as W -terms in Lemma 1, i.e., As mentioned above, X is a union of X and X 0 , and Y is a union of Y and Y 0 , the triple summation in (45) can be split into eight terms as Using (6) in Lemma 1 along with the corresponding parameters tabulated in Table 3, we can work out each sub-triple summation in (46). A series of straightforward algebra leads readily to (47).
which are the contamination-free versions of PRVCC.

Results and Discussion
In this section we verify the correctness of Theorem 1 by Monte Carlo simulations. To gain a further insight about PRVCC, we also compare it with the two classical correlation coefficients,i.e., SR and KT, in terms of RMSE. At last, we will provide an examples of time-delay estimation under CGM (4). Since the theoretical results in Theorem 1 only hold true for large sample size n and small e, in this section we set the sample size n~100 and 0:02ƒeƒ0:1. All samples are generated by suitable functions in the Matlab environment. For the sake of accuracy, the number of Monte Carlo trials is set to be 10 5 unless otherwise stated.
1 Verification of Theorem 1 Good agreements are observed between the simulation results and the theoretical counterparts. It can also be observed that the larger the contamination fraction e and difference between r and r', the bigger the bias between E(r H ) and the ideal dashed curve corresponding to e~0. Figure 2. verifies the correctness of the variance of PRVCC, by plotting the simulation results (circles) and the theoretical results of (22) (solid lines) concerning V(r H ) in the same scenarios as in Figure 1. For the purpose of comparison, the contamination-free version (50) (dashed lines) is also included in each subplot to highlight the effects of r' and e. Note that we have multiplied V(r H ) by n for a better visual effect. This figure shows good agreements between the simulation results and the corresponding theoretical ones. Moreover, it is seen that when r'~0, the curves are symmetric and the magnitude of V(r H ) increase with e, especially for DrD large. On the other hand, when r'=0, the curves are no longer asymmetric. Specifically, for DrD large, V(r H ) increases if r and r' have opposite signs; and it decreases if r and r' have the same signs. When e is fixed, From these two figures, it follows that, although derived based on the assumptions n?? and e?0, our theoretical results established in Theorem 1 are sufficiently accurate for n as small as 100 and e as large as 0:08. This means that, PRVCC is applicable to many situations in practice, such as radar and biomedical engineering, where the sample size n is much larger than 100 and the fraction of impulsive interference e is much lower than 0:08.

RMSE Comparison of PRVCC with SR amd KT
To deepen the understanding of PRVCC, in this subsection we compare in terms of RMSE the performance of PRVCC with SR and KT, which are also robust under the CGM (4) as shown in our previous work [20].
For fairness of comparison, some calibrations are necessary. From (21), it follows lim e?0 n??
by which we can define an asymptotic unbiased estimator of the population correlation r, aŝ The other two asymptotic unbiased estimators based on KT and SR are defined as [20]r Given definitions of ther r above, we can then compare their performance in terms of the popular RMSE defined by The CGM based on (4) is set to be  Table 4, where the minima with respect tor r H ,r r K andr r S are highlighted in bold font in a rowwise manner within each of the eight blocks. It appears that 1) all RMSEs are quite small, meaning that these three estimators perform similarly well under CGM (4); 2)r r H outperformsr r K andr r S in the cases that DrD takes values from 0 to medium magnitudes; 3)r r K outperformsr r H for r around +1; 4) r r S plays an intermediate role betweenr r H andr r K . Note that the RMSE values for r'~1 are not shown in Table 4 due to symmetry.

Example of Time-Delay Estimation
As remarked in Section Introduction, it is often encountered in radar, sonar or communication that we need to estimate the correlation between a prescribed ''clean'' signal with a distorted version corrupted by impulsive noise. Now we provide an example of time-delay estimation which is similar to this situation. In this example, the prescribed clean signal is a segment of sinusoidal wave y½i~f ffiffi ffi 2 p sin 10p|i 500 0ƒiv500 0 otherwise , whereas the corrupted signal is x½i~y½i{t 0 zw½i with w½i being a white contaminated Gaussian noise following the distribution of (1{e)N (0,s 2 )zeN (0,s' 2 ) where e~0:05 and s'~10 4 &s. The time-delay t 0 is set to be 1000 ms. Our purpose is to estimate t 0 as accurate as possible under various signal to noise ratio SNR ¼ D 20 log 10 (1=s)~{20 log 10 s. As illustrated in Figure 3, the procedure of estimating t 0 includes two steps. The first one is to construct a correlation function that corresponds to t by each of r H , r K and r S with respect to x½i and y½i{t. The second one is to locate time-shiftt t 0 corresponding the maximum of the correlation function. The value oft t 0 is considered to be an estimate of t 0 and restored for further analysis. Note that the number of Monte Carlo trials in this study is set to be 10 4 . Table 5 summarizes the estimates of t 0 from r H , r K and r S . It is observed that all three methods produce acceptable estimates of t 0 , for an SNR even as low as {10 dB. However, PRVCC is slightly better than the other two, in the sense of giving smaller biases and standard deviations in most cases.

Conclusions
This paper systematically investigates the statistical properties of the historical PRVCC under a particular contaminated Gaussian model. As shown in our previous work [20], this model simulates reasonably some frequently encountered scenarios where one variable is clean and the other corrupted by a tiny fraction of impulsive noise with very large variance. Under this model, we establish the asymptotic closed forms of the expectation and variance of PRVCC by means of the well known Delta method. To gain a further insight on PRVCC, we also compare it with two other classical correlation coefficients, i.e., SR and KT, in terms of the popular RMSE. Monte Carlo simulations not only verify our theoretical findings, but also reveal the strength and weakness of PRVCC in various occasions. The theoretical and empirical findings in this work are believed to add new knowledge to the area of correlation analysis that prevails in many branches of science and engineering.