Fisher Information as a Metric of Locally Optimal Processing and Stochastic Resonance

The origins of Fisher information are in its use as a performance measure for parametric estimation. We augment this and show that the Fisher information can characterize the performance in several other significant signal processing operations. For processing of a weak signal in additive white noise, we demonstrate that the Fisher information determines (i) the maximum output signal-to-noise ratio for a periodic signal; (ii) the optimum asymptotic efficacy for signal detection; (iii) the best cross-correlation coefficient for signal transmission; and (iv) the minimum mean square error of an unbiased estimator. This unifying picture, via inequalities on the Fisher information, is used to establish conditions where improvement by noise through stochastic resonance is feasible or not.


Introduction
Fisher information is foremost a measure of the minimum error in estimating an unknown parameter of a probability distribution, and its importance is related to the Cramér-Rao inequality for unbiased estimators [1,2]. By introducing a location parameter, the de Bruijn's identity indicates that the fundamental quantity of Fisher information is affiliated with the differential entropy of the minimum descriptive complexity of a random variable [1]. Furthermore, in known weak signal detection, a locally optimal detector, acting as the small-signal limited Neyman-Pearson detector, has favorable properties for small signal-to-noise ratios [3]. With sufficiently large observed data and using the central limit theorem, it is demonstrated that the locally optimal detector is asymptotically optimum and the Fisher information of the noise distribution is the upper bound of the asymptotic efficacy [2][3][4][5][6][7]. For weak random signal detection, the second order Fisher information is also associated with the maximum asymptotic efficacy of the generalized energy detector [4][5][6][7].
However, the fundamental nature of Fisher information is not adequately recognized for processing weak signals. To extend the heuristic studies of [1][2][3][4][5][6][7], in this paper, we will theoretically demonstrate that, for a weak signal buried in additive white noise, the performance for locally optimal processing can be generally measured by the Fisher information of the noise distribution. We show this for the following signal processing case studies: (i) the maximum output signal-to-noise ratio for a periodic signal; (ii) the optimum asymptotic efficacy for signal detection; (iii) the best cross-correlation coefficient for signal transmission; and (iv) the minimum mean square error of an unbiased estimator. The physical significance of Fisher information is that it provides a unified bound for characterizing the performance for locally optimal processing. Furthermore, we establish the Fisher information condition for stochastic resonance (SR) that has been studied for improving system performance over several decades . In our recent work [28], it is established that improvement by adding noise is impossible for detecting a weak known signal.
Here, based on Fisher information inequalities, we further prove that SR is not applicable for improving the performance of locally optimal processing in the considered cases (i)-(iv). This result generalizes a proof that existed previously only for a weak periodic signal in additive Gaussian noise [12,33]. However, beyond these restrictive conditions, the observed noise-enhanced effects [9][10][11]26,[28][29][30] show that SR can provide a signal processing enhancement using the constructive role of noise. The applications of SR to nonlinear signal processing are of practical interest.

Results
In many situations we are interested in processing signals that are very weak compared to the noise level [2,3,6]. It would be desirable in these situations to determine an optimal memoryless nonlinearity in the following study cases.
Output signal-to-noise ratio for a periodic signal First, consider a static nonlinearity with its output where the function g is a memoryless nonlinearity and the input is a signal-plus-noise mixture x(t)~s(t)zz(t). The component s(t) is a known weak periodic signal with a maximal amplitude A (0ƒDs(t)DƒA) and period T. Zero-mean white noise z(t), independent of s(t), has probability density function (PDF) f z and a root-mean-square (RMS) amplitude s z . It is assumed that g has zero mean under f z , i.e. Ð ?
, which is not restrictive since any arbitrary g can always include a constant bias to cancel this average [6]. The input signal-to-noise ratio for x(t) can be defined as the power contained in the spectral line 1=T divided by the power contained in the noise background in a small frequency bin DB around 1=T [10], this is with Dt indicating the time resolution or the sampling time in a discrete-time implementation and the temporal average defined as [10]. Here, we assume the sampling time Dt%T and observe the output y(t) for a sufficiently large time interval of NT (N&1) [10]. Since s(t) is periodic, y(t) is in general a cyclostationary random signal with period T [10]. Similarly, the output signal-to-noise ratio for y(t) is given by with nonstationary expectation E½y(t) and nonstationary variance var½y(t) [10].
In the case of A?0, we have a Taylor expansion of the expectation at a fixed time t as where we assume the derivatives g'(x)~dg(x)=dx and f ' z (x)~df z (x)=dx exist for almost all x (similarly hereinafter) [2,6]. Thus, we have where s(t) , compared with E½g 2 (x), can be neglected as A?0 (0vDs(t)DƒA) [2,6]. The above derivations of Eqs. (4) and (5) are exact in the asymptotic limit for weak signals, and have been generally adopted in [2,6].
Substituting Eqs. (4) and (5) into Eq. (3), we have Ã is simply the Fisher information I(f z ) of the noise PDF f z [2,6], and the equality occurs as by the Cauchy-Schwarz inequality for a constant C [2,6]. Noting Eqs. (2) and (6), the output-input signal-to-noise ratio gain G is bounded by with equality achieved when g takes the locally optimal nonlinearity g opt of Eq. (7). Here, for a standardized PDF f z 0 with zero mean and unity variance s 2 z 0~1 , the scaled noise z(t)~s z z 0 (t) has its PDF f z (z)~f z 0 (z=s z )=s z and the Fisher information satisfies p has the minimal Fisher information I(f z 0 )~1 and any standardized non-Gaussian PDF f z 0 has the Fisher information I(f z 0 )w1 [2]. It can be seen that, the linear system g L (x)~x has its output signal-tonoise ratio R out~Rin in Eq. (3). Thus, the output-input signal-tonoise ratio gain G in Eq. (8) also clearly represents the expected performance improvement of the nonlinearity g over the linear system g L .

Optimum asymptotic efficacy for signal detection
where the components z n form a sequence of independent and identically distributed (i.i.d.) random variables with PDF f z , and the known signal components s n are with the signal strength h [6]. For the known signal sequence fs n ,n~1,2, Á Á Á ,Ng, it is assumed that there exists a finite (non-zero) bound A such that 0ƒDs n DƒA, and the asymptotic average signal power is finite and non-zero, i.e. 0vP 2 s~l im N?? P N n~1 s 2 n =Nv? [6]. Then, the detection problem can be formulated as a hypothesis-testing problem of deciding a null hypothesis H 0 (h~0) and an alternative hypothesis H 1 (hw0) describing the joint density function of X with Consider a generalized correlation detector where the memoryless nonlinearity g has zero mean under f z , i.e. E½g(x)~0 [6]. In the asymptotic case of h?0 and N??, the test statistic T GC , according to the central limit theorem, converges to a Gaussian distribution with mean E½T GC DH 0 ~0 and variance var½T GC DH 0 &NP 2 s E½g 2 (x) under the null hypotheses H 0 [6]. Using Eqs. (4) and (5), T GC is asymptotically Gaussian with mean E½T GC DH 1 &hNP 2 s E½g'(x) and variance var½T GC DH 1 ṽ ar½T GC DH 0 under the hypothesis H 1 [6].
Given a false alarm probability P FA , the asymptotic detection probability P D for the generalized correlation detector of Eq. (11) can be expressed as [2,6] with Q(x)~Ð ?
x exp½{t 2 =2= ffiffiffiffiffi ffi 2p p dt and its inverse function Q {1 [2,6]. Thus, for fixed N and hP s (since the signal is known), P D is a monotonically increasing function of the normalized asymptotic efficacy j GC given by [6] j GC~l im with equality being achieved when g~g opt in Eq. (7). This result also indicates that the asymptotic optimal detector is just the locally optimal detector established by the Taylor expansion of the likelihood ratio test statistic ln½P N n~1 f z (X n {hs n )= P N n~1 f z (X n )& P N n~1 g opt (X n )hs n (C~{1) in terms of the generalized Neyman-Pearson lemma [2,6].
Interestingly, with j LC~E 2 ½g'(x)=E½g 2 (x)~s {2 z achieved by a linear correlation detector (g LC (x)~x in Eq. (11)) as a benchmark [5,6], the asymptotic relative efficiency provides an asymptotic performance improvement of a generalized correlation detector over the linear correlation detector when both detectors operate in the same noise environment [5,6]. Next, consider the weak random signal components s n has PDF H 0 : f X (X )~P N n~1 f z (X n ), for h~0; P N n~1 f z (X n {hs n )f s (s n )ds n , for hw0, for determining whether the random signal is present or not. Consider a generalized energy detector where we also assume E½T GE DH 0 ~0, and then var½T GE DH 0 ~NE½g 2 (x). Furthermore, in the asymptotic case of h?0, the expectation [6] E½T GE DH 1 ~N ð ?
Thus, the efficacy of a generalized energy detector is defined as [6] j GE~l im where h 2 is treated as the signal strength parameter and I 2 (f z ) is the second order Fisher information [6,7]. It is noted that the equality of Eq. (17) is achieved as g(x)~g opt (x)~Cf '' z =f z for a constant C [6]. Given a false alarm probability P FA , the asymptotic detection probability P D for the generalized energy detector of Eq. (15) is a monotonically increasing function of the efficacy j GE [5][6][7].

Cross-correlation coefficient for signal transmission
Thirdly, we transmit a weak aperiodic signal s(t) through the nonlinearity g of Eq. (1) [13]. Here, the signal s(t) is with the average signal variance s 2 s %s 2 z , the zero mean and the upper bound A (0ƒDs(t)DƒA). For example, s(t) can be a sample according to a uniformly distributed random signal equally taking values from a bounded interval. The input cross-correlation coefficient of s(t) and x(t)~s(t)zz(t) is defined as [2,13] Using Eqs. (4) and (5), the output cross-correlation coefficient of s(t) and y(t)~g½x(t) is given by which has its maximal value as g~g opt of Eq. (7). Then, the crosscorrelation gain G r is bounded by Mean square error of an unbiased estimator Finally, for the N observation components x n~sn (h)zz n , we assume the signal s n (h) are with an unknown parameter h. As the upper bound A?0 (0ƒDs n DƒA), the Cramér-Rao inequality indicates that the mean squared error of any unbiased estimator of the parameter h is lower bounded by the reciprocal of the Fisher information [1,2] given by which indicates that the minimum mean square error of any unbiased estimator is also determined by the Fisher information I(f z ) of a distribution, as P N n~1 ( Lsn Lh ) 2 is given.
Therefore, just as the Fisher information represents the lower bound of the mean squared error of any unbiased estimator in signal estimation [1,2], the physical significance of the Fisher information I(f z )(I 2 (f z )) is that it provides a unified upper bound of the performance for locally optimal processing in the considered signal processing cases.
Aiming to explain the upper bound of the performance for locally optimal processing as Fisher information, we here show an illustrative example in Fig. 1. Consider the generalized Gaussian noise with PDF where c 1~a 2 C for a rate of exponential decay parameter aw0 [2,6]. The corresponding locally optimal nonlinearity is g opt (x)~DxD a{1 sign(x) and the output-input signal-to-noise ratio gain in Eq. (8) is Fig. 1. For comparison, we also operate the sign nonlinearity g S (x)~sign(x) and the linear system g L (x)~x in the generalized Gaussian noise. The output-input signal-to-noise ratio gain in Eq.
, as shown in Fig. 1. For the linear system g L , Eq. (8) indicates that G~1 (dotted line) for aw0, as plotted in Fig. 1. It is seen in Fig. 1 that, only for a~1, the performance of g S attains that of the locally optimal nonlinearity of g opt . This is because, the nonlinearity g S is just the locally optimal nonlinearity for Laplacian noise (a~1), and the Fisher information limit I(f z 0 )~2 is achieved. Likewise, for Gaussian noise (a~2), the linear system g L is optimal and the output-input SNR gain G~I(f z 0 )~1. It is noted that the above analyses are also valid for the asymptotic relative efficiency of Eq. (14) and the cross-correlation gain of Eq. (20).
An open question concerning SR is that, under the asymptotic cases of weak signal and large sample size, can SR play a role in locally optimal processing? Here, based on the Fisher information inequalities, we will demonstrate that SR is inapplicable to performance improvement for locally optimal processing. For a given observation x(t)~s(t)zz(t), we add the extra noise v(t), independent of the initial noise z(t) and the signal s(t), to x(t). Then, the updated datax x(t)~s(t)zz(t)zv(t)~s(t)zw(t). Here, the composite noise w(t) has a convolved PDF where f v is the PDF of noise v(t). Currently, the weak signal s(t) is corrupted by the composite noise w(t), and then the performance measures of locally optimal processing in Eqs. (6), (13), (17), (19) and (21) should be replaced with I(f w ) (I 2 (f w )). It can be shown by the Cauchy-Schwarz inequality that [34] This is because that, if I(f z )ƒI(f v ), then using f ' w (x)~Ð ?
Therefore, in asymptotic cases of weak signal and large sample size, Eqs. (24) and (25) show that SR cannot improve the performance of the above four locally optimal processing cases by adding more noise. However, the asymptotic limits of weak signal and large sample size are well delimited, and may not be met in practice. It is interesting to note that, under less restrictive conditions, noise-enhanced effects have been observed in fixed locally optimal detectors [9], suboptimal detectors [26,29], the optimal detector with finite sample sizes [11] or non-weak signals [11,25], soft-threshold systems [30] and the dead-zone limiter detector [28] by utilizing the constructive role of noise.
We here present an illustrative example of SR that occurs outsides restrictive conditions, where a suboptimal detector is adopted for Gaussian noise. Consider a generalized correlation Figure 1. The output-input signal-to-noise ratio gain G. The output-input signal-to-noise ratio gain G versus the exponential decay parameter a of the generalized Gaussian noise for the locally optimal nonlinearity g opt (solid line), the sign nonlinearity g S (red line) and the linear system g L (dotted line), respectively. doi:10.1371/journal.pone.0034282.g001 Figure 2. The normalized asymptotic efficacy j GC . The normalized asymptotic efficacy j GC of the dead-zone limiter nonlinearity g DZ (solid line) and the linear system g L (red line) as a function of the RMS amplitude s z of Gaussian noise (a~2). doi:10.1371/journal.pone.0034282.g002