Figures
Abstract
An important inferential task in functional linear models is to test the dependence between the response and the functional predictor. The traditional testing theory was constructed based on the functional principle component analysis which requires estimating the covariance operator of the functional predictor. Due to the intrinsic high-dimensionality of functional data, the sample is often not large enough to allow accurate estimation of the covariance operator and hence causes the follow-up test underpowered. To avoid the expensive estimation of the covariance operator, we propose a nonparametric method called Functional Linear models with U-statistics TEsting (FLUTE) to test the dependence assumption. We show that the FLUTE test is more powerful than the current benchmark method (Kokoszka P,2008; Patilea V,2016) in the small or moderate sample case. We further prove the asymptotic normality of our test statistic under both the null hypothesis and a local alternative hypothesis. The merit of our method is demonstrated by both simulation studies and real examples.
Citation: Hu W, Lin N, Zhang B (2020) Nonparametric testing of lack of dependence in functional linear models. PLoS ONE 15(6): e0234094. https://doi.org/10.1371/journal.pone.0234094
Editor: Xiaofeng Wang, Cleveland Clinic Lerner Research Institute, UNITED STATES
Received: December 31, 2019; Accepted: May 18, 2020; Published: June 26, 2020
Copyright: © 2020 Hu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying this study are from the ‘CanadianWeather’ dataset, which is publicly available. The authors used the ‘CanadianWeather’ dataset by loading the R package ‘fda’ directly in the R program. The link of the R package ‘fda’ is available here: https://cran.r-project.org/web/packages/fda/ and the description of this dataset is here: https://www.rdocumentation.org/packages/fda/versions/5.1.4/topics/CanadianWeather. Another link to the dataset is here: http://www.psych.mcgill.ca/misc/fda/ex-weather-a1.html. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. The authors did not have special access privileges.
Funding: Research was supported by the National Natural Science Foundation of China (Grant No.11671268). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Functional regression studies how a response variable Y varies with a functional predictor X(s), where Y can be scalar () or functional (Y(t) ∈ L2([0, 1])). The space L2([0, 1]) denotes the Hilbert space for square integrable functions. Without loss of generality, we define the index s and t on the closed interval [0, 1]. In case the raw support of s and t is a closed interval [a, b], one can simply rescale it to the interval [0, 1]. In this paper, we assume data following the widely used functional linear model (FLM) [1–4]. For a functional response, the FLM is defined as
(1)
where both the intercept α(t) and the random error ϵ(t) are square integrable and independent of X(s), the regression coefficient β(t, s) is in L2[0, 1] × L2[0, 1]. Denote that
, where
is the regression operator
. For a scalar response variable Y, the FLM has a simpler form
, where both the intercept α and the random error ϵ are real valued, and the regression parameter β(s) ∈ L2[0, 1]. Hereafter, we mainly focus on FLMs with functional responses, but the general methodology also applies to scalar responses.
In this paper, we consider testing whether the regression operator has an assigned structure
, that is, to test
(2)
In practice, people often focus on the special case with
, i.e. to test the dependency between the response variable and the predictor. Existing tests in the literature for this problem can be categorized into parametric and nonparametric tests. In parametric tests, the test statistics are usually established by first estimating the functional regression coefficient through dimension reduction, such as functional PCA [5, 6–8]. Methods for real-valued responses include [6], [7] and [8]. [6] used a test statistic based on the L2 norm of the empirical cross-covariance operator of (X, Y). [8] proposed a Wald-type test with varying thresholds in selecting the number of principal components. [7] developed four test statistics based on the functional principal component (FPC) scores. They assume normality on the error distribution due to the need of the likelihood function. For functional responses, the test statistic proposed by [5] is constructed based on the eigenvalues and eigenfunctions decomposed from the functional PCA of the response variable Y(t) and the predictor X(s). Such parametric methods require the costly estimation of the covariance operator of the predictor. Due to the intrinsic high dimensionality of functional data, the inaccuracy and numerical instability in the covariance operator estimation may render the parametric tests invalid especially for small or moderate size samples. The same issue also occurs in high dimensional problems in multivariate statistics [9, 10]. Another limitation of FPC is that the principal component scores are computed independently from the predictor. Then the directions which explain X(t) best may not be the best predictors for the response which may lead to disparate test results for the regression problems. On the other hand, nonparametric tests utilized a different idea to avoid estimating the covariaznce operator [11, 12, 15]. For real-valued responses, [11] used the Nadaraya-Watson technique [13, 14] to estimate the conditional mean of
given X = x. [15] also proposed a nonparametric test based on a kernel function for real responses. However, this method still requires estimating the covariance operator to calculate the semimetric. Furthermore, this test needs to split the sample into three groups, one of the three groups is used to estimate the kernel function, another to estimate the sample mean of responses variables, and the last group contributes to statistic, which is suitable for large sample data. The test proposed by [12] is for functional responses, and its test statistic is a weighted U-statistic with weights obtained from nearest neighbor smoothing. While this test possesses the correct Type-I error rate, identification of the neighbors requires defining distances between the functional predictors in the least favorable direction, which tends to result in lower power in general.
Motivated by [10], we propose a novel nonparametric test, called FLUTE, based on a U-statistic that measures the L2 distance in the induced space after transforming the original space of the functional predictor by the covariance operator. Our approach avoids explicit estimation of the covariance operator as it is based on the distance in the induced space. The FLUTE test can be applied to both real-valued and functional responses.
The paper is organized as follows. In Section Methodology, we introduce basic notations about functional linear regression model and the FLUTE test statistic. After presenting the theory in Section Asymptotic theory for functional responses, we further discuss the FLUTE test for FLMs with a scalar response in the next Section. Section Simulation and real data reports results from simulation studies and real data. The last section is the conclusion section.
Methodology
Notation and assumptions
Let 〈⋅, ⋅〉 denote the inner product in L2[0, 1], that is, for any f1, f2 ∈ L2[0, 1] The L2 norm ‖⋅‖ is defined by ‖f‖2 = 〈f, f〉. We assume in the FLM (1) that both the predictor X(s) and response variable Y(t) are random elements of L2[0, 1] and integrable. The sample functions Xi(s), i = 1, …, n, are independently and identically distributed (i.i.d.) with E[X(s)] = μx(s) and E‖X(s)‖4 < ∞. We also assume that the random trajectories ϵi(t) are i.i.d with E[ϵ(t)] = 0, E[ϵ2(t)] = σ2(t) and E‖ϵ(t)‖4 < ∞.
Suppose {ϕk, k ≥ 1} and {ηℓ, ℓ ≥ 1} are some orthonormal bases of the Hilbert space and
, respectively. To simplify the representation, hereafter we focus on the case where the Hilbert spaces
and
are L2[0, 1]. Then we represent the predictor Xi(s) and the regression coefficient β(t, s) via the Karhunen-Loève decomposition [16]. For any s ∈ [0, 1], we have
(3)
where μx(s) is the mean function of the predictor Xi(s), and the expansion coefficient ξik = 〈Xi, ϕk〉, with E[ξik] = 0. For any t, s ∈ [0, 1], the regression coefficient β(t, s) is represented as
(4)
where βℓk = ∬β(t, s)ηℓ(t)ϕk(s)dtds.
Next we introduce the covariance operator of the predictor X(s) and its empirical counterpart
. For any element f ∈ L2([0, 1]), we define
where
. And denote the corresponding eigenelements by
with the eigenvalues λ1 ≥ λ2 ≥ … and νk the eigenfunction corresponding to λk.
If both the predictor X(s) and response variable Y(t) are fully observed, hereafter we assume that α(t) = 0 and μx(s) = 0 which will be explained in the next section, then the FLM with a functional response (1) can be represented as,
(5)
In practice, the infinite expansion (3), (4) and (5) above is usually approximated by a truncated basis expansion (e.g. B-spline basis and Fourier basis) [4, 11, 16, 17]. If the functional variables are densely observed, then recovering each trajectory of the functional variables based on the least square method is straightforward [18, 19]. If the functional variables are sparsely observed, [20] and [21] proposed to estimate the FPC scores through local linear surface smoother for the covariance operator, and then approximate each trajectory using the first K eigenfunctions. For the sparse observation, the error can not be ignored. Due to the certified complexity of the asymptotic normality of the statistic, we leave this area for future investigation.
We will represent the FLM with a functional response (1) using basis expansion when the approximation error is controlled. Denote ei(t) as the approximation error produced through and
approximating
, that is,
(6)
Similar with Condition 1 in Appendix B in [22], then by the Cauchy-Schwarz inequality, uniformly across all i = 1, …, n, we have (Please see lemma 1), where C and
are two positive constants.
As K, L → ∞, the approximation error should be more precise and become ignored. Hence the FLM with a functional response (1) can be rewritten as,
(7)
In this paper, we assume that both the predictor X(s) and response variable Y(t) are fully observed or the approximation error is controlled.
The FLUTE test
In this section, we introduce the FLUTE test whose test statistic is a U-statistic. The theory of U-statistics for fixed-dimensional data, pioneered by [23], has been well documented; see [24] and [25] for summaries. Recently, [10] developed the theory for high-dimensional multivariate data.
Under the functional linear model (1), if α(t) = 0 and E[X(s)] = 0, we can see that , which is then the perturbed L2 norm for measuring the distance between
and
. Further, it is easy to see that
(8)
where c(s, e) = E[X(s)X(e)]. Note that
is equivalent to the first term on the right-hand side of Eq (8) being zero. Thus we may consider testing the hypothesis (2) by a U-statistic with
as the kernel, whose expectation is
, where
.
For the general case where α(t) ≠ 0 and E[X(s)] ≠ 0, we consider the U-statistic Tn,
(9)
where
, and
, with
, and
denotes combinations over all subscripts (i1, …, i4). As the statistic Tn is invariant to location shifts in both Xi(s) and Yi(t), without loss of generality, we assume that α(t) = 0 and μx(s) = 0 in the rest of the paper. Define θ(F) = E[ψ(i1, i2, i3, i4)], then E[Tn] = θ(F).
As the statistic Tn measures the distance between the regression operator and the assumed structure
under the null hypothesis, large values of the statistic Tn are in favor of the alternative hypothesis and leads to rejection of the null hypothesis.
For the representation of the predictor X(s), we have
(10)
where Φ(s) = (ϕ1(s), …, ϕK(s))′, ξ = (ξ1, …, ξK)′, and var[ξ] = Σ. We next follow the general condition in [9] and assume that the loadings ξ of the predictor X(s) have a factor design structure.
Assumption 1 There exists a m−variate random vector
N = (N1, …, Nm)′ for some m < ∞ so that ξ = ΓN. Here Γ is a K × m matrix such that ΓΓ′ = Σ, and E[N] = 0, var[N] = Im, where Im is the m × m identity matrix. Each random variable Nℓ, ℓ = 1, …, m, is assumed to have finite 8th moment and for some constant ρ ∈ [0, 1). Further, for any
and 1 ≤ m1 < m2 < … < md ≤ m, we assume
Assumption 1 allows factors N to have a weak correlation. If the predictor X(s) follows a Gaussian process, [16] pointed out that X(s) admits the following expansion
with independent standard normal random variables Nk’s. It is easy to see that this is a special case of the factor design structure in Assumption 1, where the (a, b)th element of the transformation matrix ΓK×m is
.
Let εi = (εi1, …, εiL)′ which is the expansion coefficients of ϵ(t), and Λ = var[ε]. We assume the following assumption.
Assumption 2 For i ≠ j, and
.
Asymptotic theory
In this section, we derive the asymptotic unbiasedness of the FLUTE test and the asymptotic normality of its test statistic under both the null and a local alternative hypothesis through the Hoeffding decomposition.
Let Wi = (Xi(s), ϵi(t)), where . Thus, ψ(i1, i2, i3, i4) in Eq (9) can be represented as
. And ψc(w1, …, wc) = E[ψ(w1, …, wc, Wc + 1, …, W4)], be the projections of ψ to lower-dimensional sample spaces for c = 1, …, 4, where w1, …, wc are fixed variables (e.g.
,
,
). The specific forms have been given in the appendix of Proof of Theorem 1. Let vc = var[ψc] be the variance. Let
, then we have the Hoeffding decompositions for Tn is
, where
and
with
. The decomposition for the variance of Tn is
. We assume that E[ψ2(W1, …, W4)] exists. The proofs of the Hoeffding decompositions can be found in [23] and also [24]. [10] recently showed that the decomposition also holds when the dimension of the predictor K increases to infinity. Based on Proposition 1 in [10], if we find the minimum c′ such that
, c′ = 1, 2, or 3, is of the same order as v4, then Tn will be dominated by the first c′ terms, so that
(11)
Theorem 1 Under the FLM ( 1 ) and assuming Assumption (1),
K, L → ∞ as n → ∞, we have
- (i).
and
- (ii).
where and
Please see the Proof of Theorem 1 in Appendix.
Let Δ = (βℓk − β0,ℓk)ℓ,k, where β0,ℓk define the loadings of β0(t, s). And let Ma = ΔΣaΔ′, a = 0, 1, 2, 3 (e.g. Σ0 = IK, Σ2 = ΣΣ), Q0 = Γ′Γ, Q1 = Γ′Δ′ΔΓ, Q2 = Γ′ΣΔ′ΔΓ, Q3 = Γ′ΣΓ, Q4 = Γ′Δ′ΔΣΔ′ΔΓ. Under , we have Δ = 0, and hence Q1 = Q2 = Q3 = Mi = 0 for i = 0, 1, 2, 3. So it is obvious that v1 = 0, and Tn is then a degenerate U-statistic. Under this case, we have
Next we show that the form of the variance for Tn also holds under a subclass of local alternative hypothesis specified by the following condition,
(12)
Under the null hypothesis, the equation v1 = o(n−1 v2) holds with v1 = 0. Under the local hypothesis, the equation v1 = o(n−1 v2) still holds (see Appendix). The following theorem then states the asymptotic normality of our test statistic under this local alternative.
Theorem 2. Under the FLM ( 1 ), assuming Assumptions (1) and (2), under either the null hypothesis or the local alternatives
, as n → ∞, we have
Please also see the Proof of Theorem 2 in Appendix.
For real data, the trace tr(Σ2) and tr(Λ2) need to be estimated. We use the estimator given in Chen and Qin [26], which was shown to be unbiased and ratio consistent, i.e. , under the null hypothesis or the local alternatives. Specifically, the estimator is given as
(13)
where
, and
with
. Following the same idea, we can also construct a consistent estimator of tr(Λ2) under H0.
Following Theorem 2, the FLUTE test rejects at significant level α if
where zα is the upper α−quantile of N(0, 1).
Theorem 2 also implies that the asymptotic power of the proposed statistic under the local alternative is
The quantity
can be viewed as a signal to noise ratio (SNR). If rn(β − β0, Σ, Λ) = o(1), it is obvious that the power converges to α. If rn(β − β0, Σ, Λ) is in the order of O(1), the power converges to 1.
FLUTE for scalar responses
In the FLM with a scalar response,
(14)
where Y ∈ R. The null hypothesis for the scalar response is defined as
The idea of the FLUTE method in Section Asymptotic theory directly applies and only requires slight modification toward the dimension of the response and functional regression coefficients. For example, the kernel of the FLUTE statistic is (Yi − 〈Xi, β0〉)〈Xi(s), Xj(s)〉(Yj − 〈Xj, β0〉) with expectation
, where
. The expansion of parameter β(t) is
. The theory can be developed using the same idea as in Section Asymptotic theory. We distinguish by denoting the counterpart to notations in Section Asymptotic theory with a check mark. For example, the kernel of the FLUTE statistic for scalar response model (14) is denoted by
, and its variance is
. The following theorems show that the same asymptotic null distribution in Theorems 1 and 2 hold for the scalar response case.
Theorem 3. Under the FLM with scalar response ( 14 ), assuming Assumption (1), when K → ∞ as n → ∞, we have
- (i).
and
- (ii).
We consider the local alternative hypothesis as follows.
and
where
.
Theorem 4. In the FLM with a scalar response ( 14 ), assume Assumption (1) and E[ϵ4] is finite. Under either the null hypothesis or the local alternatives
, as n → ∞, we have
where σ2 = var[ϵ].
The proofs of Theorems 3 and 4 are omitted because they can be proved in the same way as Theorems 1 and 2 except with slight modification to the notations to reflect the difference in dimensionality.
Simulation and real data
In this section, we demonstrate the performance of the FLUTE method by simulation studies and an application to a real data example. For cases with functional responses, we compare the FLUTE method with the method in [1], which is constructed based on the functional PCA, we call KMSZ, and the nonparametric test in [12] is constructed by a weighted U-statistic and we name as NP. The KMSZ method depends on the functional principal components and is more suitable for large sample case which could estimate the covariance operator well. The test statistic of the NP method depends on so-called the least-favorable direction γ which is more suitable for the low dimension case. Under the simulation setup, this direction γ can be decided in three different ways: 1) Pre-estimate γ based on a super large simulated data set and then use it for all simulated data sets; 2) pre-estimate γ based on the data set generated at each level of |β|2 and then use it for simulated data sets generated at the same level of |β|2; and 3) estimate γ based on each simulated data set. The simulation results please see Table 1 and more details can be found in the Supplementary Material A in S1 File. Results reported in this section are based on the second way, which is consistent with applications of the NP method to real data.
For FLMs with a scalar response, neither the KMSZ nor NP method is applicable because the former involves functional PCA on the response and the latter requires computing the L2 norm between two functional response values. The nonparametric test proposed by [15] which we name as NETRF is for the scalar response. However, the NETRF test still requires estimating the covariance operator to calculate the semetric. Furthermore, this test needs sufficiently large sample data to provide accurate estimations of each group. Therefore, we do not directly compare DelSol’s method with our FLUTE test, but we conduct simulation studies for small/moderate sample cases to demonstrate the incapability of DelSol’s method under these scenarios. Here we choose the current comparison benchmark as the F-test proposed by [7]. [7] actually proposed four asymptotically equivalent tests which also depends on the functional components, and can be more suitable for large sample case. We use the F-test because it behaves the best of the four tests for small to moderate samples.
Simulation results
We next conduct a simulation study to evaluate the empirical size and power of our FLUTE test for small to moderate samples with sample size n varying between 40 and 100. In each simulation, we generate 1,000 Monte Carlo samples. Our computer codes are written in R. For basis expansion and functional PCA, we use the implementation in the R package fda.
Functional response.
First we present the case of the FLM with functional responses. This simulation design follows the FLM (1), where we set β(t, s) = |β|2 exp{(t2 + s2)/2}. Here |β|2 indicates the L2 norm of β(t, s) and is used to control the SNR. We generate the functional predictor Xi(s) according to Eq (10), where the bases are chosen as Fourier bases. For instance, the first five orthonormal Fourier basis functions are
,
,
, and
. Without loss of generality, we set the mean μx(s) = 0. According to the factor design (Assumption 1), the loadings ξ1, …, ξn are independently generated from the following moving average model,
(15)
where the constant T controls range of dependency(see Fig 1). The coefficients
are randomly generated from U(0, 1), where U(a, b) denotes the Uniform distribution on the interval (a, b). And the random vectors N i = (Ni1, …, Ni(K+T−1))′ are independently generated from the N(0, IK + T−1) distribution. It then follows that the (k, ℓ)th element of the covariance matrix var(ξ) is
, which shows that the correlation between ξik and ξiℓ is controlled by |k − ℓ| and T. The random error function εi(t) is generated according to the decomposition in Eq (10). We also set bases
in the same way as {ϕk(s)}. And the loadings εi1, …, εiL are independent identical distribution and generate from N(0, Σϵ).
To evaluate the impact of dimensionality and sample size, we carry out simulations under four different settings, varying in dimensionality and sample size K = L = 5 (low-dimensional) and K = L = 11 (high-dimensional), n = 40 and 100. When generating X(t), we set T = 5 in Eq (15). The variances of the loadings εi1, …, εiL are the same, we set Σϵ = IL. Under each setting, we vary the |β|2 at 10 levels from 0 to 0.5 (see Tables 2 and 3). When |β|2 is 0, the result provides the empirical size of all tests, and results at the other 9 levels give the power. Each testing method is evaluated at two nominal significance levels α = 0.05 and 0.1.
Table 2 shows the empirical sizes and power obtained for different dimensionality and sample size under the nominal significance level α = 0.05. Under the same sample size, the power of all three tests decrease as the dimensions K and L increase. When the dimensionality is the same, the power of all three methods improves as the sample size increases. Table 2 also shows that the FLUTE method performs stably in both the low dimensional cases and the high dimensional cases. The KMSZ and NP tests are conservative and their power decreases significantly as the dimension increases.
Further the NP method has almost no power in the case of high dimension and small sample size (K = L = 11, n = 40). It is apparent that in Table 2, power of the FLUTE method is consistently higher than that of the KMSZ and NP methods, especially in high dimensional cases. The simulation results also show that the FLUTE method respects the nominal levels under high dimensionality at both sample size n = 40 and 100.
Table 3 shows the results under the nominal significance level α = 0.1, and provides the same conclusion as Table 2.
To evaluate the impact of the correlation structure, we carry out simulations under two different settings, T = 5 (weakly correlated) and 11 (strongly correlated) (see Fig 1). Under each setting, we vary |β|2 at 6 levels from 0 to 0.5 (see Table 4). When T is 5, the correlation is weak. Table 4 shows the empirical sizes and power obtained for the case of K = L = 11 and n = 40. The FLUTE method is stable for different T, both of the KMSZ and the NP method are more sensitive to the correlation structure. On the other hand, the power of the NP statistic decreases significantly when T increases, since this method needs to search the least-favorable direction. While the power of the KMSZ statistic decreases significantly when T reduces, since this method depends on functional PCA. When the correlation is weak, the number of functional PCs would increase to achieve the same percentage of variance explanation, hence the number of p also increase which results in lower power.
To evaluate the performance of the FLUTE method with heteroscedastic variance, we carry out simulations under the following settings, the designed variances of the expansion coefficients of εi(t) are Var(εiℓ) = 1/ℓ, for i = 1, …, n, ℓ = 1, …, L. We set T = 5, n = 40. And we vary |β|2 at 6 levels from 0 to 0.5. The significance levels are α = 0.05 and α = 0.1 respectively. Table 5 shows the empirical sizes and power for the cases of K = L = 5 and K = L = 11, and provides a similar conclusion as Tables 2 and 3 when n = 40. The power of all three tests decreases as the dimensions K and L increase. However, the FULTE method performs stably in both low dimensional cases and the high dimensional cases, the power of the KMSZ and NP tests decreases significantly as the dimension increases.
Fig 2 shows the histograms of the FLUTE statistic for different dimensionality and sample size under the null hypothesis, which matches nicely with the imposed standard normal density. This is consistent with our results in Theorm 2.
The solid line indicates the density of the standard normal distribution.
Fig 3 shows the power curves of the FLUTE statistic under four different cases with varying dimensionality and sample size when the nominal significance level α = 0.05 and level α = 0.1. Under all the four cases, power curves have effective size, and when |β|2 is 0.2, the four power curves almost reached 1.
Case 1: K = L = 5 and n = 40; Case 2: K = L = 11 and n = 40; Case 3: K = L = 5 and n = 100; Case 4: K = L = 11 and n = 100. The left figure is for α = 0.05, and the right is for α = 0.1.
Scalar response.
This section presents the results for FLMs with scalar responses. This simulation design follows the model (14). We set the coefficient of regression parameter as . The functional predictor X(t) is generated in the same way as in Section Functional response. And the random errors εi are independently generated from N(0, 1).
Same with FLMs with functional responses in Section Functional response, we carry out simulations under four different settings, K = 5 (low-dimensional) and K = 11 (high-dimensional), n = 40 and 100. Under each setting, we vary |β|2 at 10 levels from 0 to 0.5 (see Tables 5 and 6). Each testing method is evaluated at two nominal significance levels α = 0.05 and 0.1.
Table 6 shows the empirical sizes and powers obtained for different dimensionality and sample size under the nominal significance level α = 0.05. The power of the two tests, FLUTE and the F-test in [7], is similar in these four cases. The results show that the FLUTE test is more powerful than the F-test. Table 7 shows the same conclusions at nominal significance level α = 0.1.
Table 8 shows the comparison between FLUTE and Delsol’s method at significant level α = 0.1, and K = 11. NETRF1, NETRF2 and NETRF3 stand for three bootstrap methods in Delsol’s paper. Due to the three test statistics are nonparametric tests that are constructed based on a kernel function, the estimation of bias and variance terms seems difficult. Further, it is usually irrelevant to use the quantiles of the asymptotic law to estimate the threshold directly. Thus, the bootstrap procedure is needed. For all three methods, we choose the semi-metric induced by functional principal components, and split the samples into three groups as 20, 10 and 10, when n = 40, 40, 30 and 30, when n = 100. Under each setting, the empirical significance level are calculated by 1000 bootstrap iterations. FLUTE stands for our method. It is obviously that the sizes of Delsol’s methods can not be well controlled at the nominal level for small/moderate samples.
Application to Canadian Weather data
The Canadian Weather data is available from the R package fda (http://www.r-project.org) which named CanadianWeather. The data consists of the daily temperature and rainfall registered in 35 weather stations in Canada averaged over 1960 to 1994, hence the sample size is 35. We view the daily temperature as the predictor and the rainfall as the response variable. Both the predictor and the response variable are functional. We use the FLUTE test to check the dependency between the daily temperature and the rainfall. Following [3], we choose 11 Fourier bases to fit the temperature curve and rainfall curve for each station separately.
Let Yi(t) represent the logarithm of the rainfall at the station i at time t and xi(t) be the temperature of the same station at time t of the year. The value of FLUTE statistic is 12.17159 based on the whole 35 stations, hence we reject the null hypothesis. To illustrate the efficacy of the test, we repeat the test on 1000 bootstrap samples. Each bootstrap sample consists of data at 35 randomly selected stations with replacement from the total 35 stations. Fig 4 shows that the density of the FLUTE statistic is far away from the standard normal distribution, hence we prefer to reject the null hypothesis.
Conclusion
We proposed the FLUTE test for testing dependence between the response and functional predictor in FLMs with either a real or functional response. By constructing a U-statistic that measures the L2 distance in an induced space, the FLUTE statistic avoids estimating the covariance operator of the predictor. The parametric test in [1] requires estimation of the covariance operator and demands large samples. The nonparametric test in [12], although avoids explicitly estimating the covariance operator, requires estimating the least-favorable direction γ. In general, using the least-favorable direction leads to lower power. Meanwhile, our experience suggests the estimation of γ can be numerically unstable across different simulated data sets, which results in poor test performance.
Our FLUTE test does not suffer from these problems. It requires minimum effort in estimating model parameters, hence achieves higher power, especially for high dimensional cases. One potential weakness of the FLUTE test is its high computational cost in evaluating a U-statistic in large samples. However, estimating covariance operator is less a concern in large samples, one can switch to using parametric methods. We recommend the best context of using the FLUTE test is small to moderate sample problems.
Appendix
Proof of Theorems.
Lemma 1. Suppose the functional predictors {Xi, i = 1, …, n} and the regression function β(t, s) satisfy the following two conditions,
- (A). Functional predictors, {Xi, i = 1, …, n}, belongs to a Sobolev ellipsoid of order two: there exists a universal constant C, such that
for all i = 1, …, n.
- (B). The regression functions satisfy
with some constant
. Further as L → ∞, the summation of coefficients
for k = 1, 2, ….
then we have the approximation error .
Proof. Recall that
Then by the Cauchy-Schwarz inequality, we have
Next we show the three parts are controlled separately. According to the Holder inequality and Condition (A), we have
(16)
And we have
(17)
Similar with the proof of Eqs (16) and (17), we get
(18)
Hence we complete the proof by combining the bounds on each of the three parts.
Next, to prove Theorems 1 and 2, we first introduce some lemmas.
Lemma 2. Suppose random vector , satisfy E[Zi] = 0, var[Zi] = Ip,
, where ρ is a constant in (0, 1). If the two random variables Z1 and Z2 are independent, for any square matrix M = (mkℓ)p×p, we have
- (1).
;
- (2).
- (3).
Proof.
- (1). Let
, where W1(k, ℓ) indicates the (k, ℓ)th element of W1. With direct computation, we have W1(k, ℓ) = Z1k Z1ℓ∑i,j mijZ1i Z1j. If k = ℓ,
If k ≠ ℓ, E[W1(k, ℓ)] = mkℓ + mℓk. Then E[W1] = M + M′+ tr(M)Ip + ρdiag(M).
- (2). Since
, and
, then we have
- (3). It’s simple to show that
.
Lemma 3. Consider symmetrical and semi-positive definite matrices A and B, [27] has improved some inequalities:
- (1). tr(AB)2 ≤ tr(A2)tr(B2);
- (2). tr2(AB) ≤ tr(A2)tr(B2).
Lemma 4. For matries Ma, a = 1, 2, 3 defined as Ma = ΔΣaΔ′, we have tr2(M2) ≤ tr(M1)tr(M3).
Proof of Theorem 1.
Recall that the definition of the statistic in Eq (9), it is straightforward to show that .
To find the dominating terms, we need to calculate the following projections,
Based on the expansion of Xi(t) and the orthogonality of the bases, we can derive the variance of the projections vc.
With straightforward calculations, we get
Here, the Hadamard product is defined as A ∘ B = (aijbij) for matrices A = (aij) and B = (bij). Since both variances v2 and v4 are linear combinations of
, tr2(M2), tr(M1 M3), tr(ΛM3),
, tr(ΛM1)tr(Σ2), tr(Q2 ∘ Q2), tr(Q0 ∘ Q1)2, tr(Γ′Δ′ΛΔΓ ∘ Q3), tr(Q3 ∘ Q4), and tr(Σ2), they are of the same asymptotic order. This means that the statistic Tn is dominated by the first two terms corresponding to Vn1 and Vn2. Hence we can get the Hoeffding decomposition (11) of Tn,
and
Then we complete the proof.
Proof of Theorem 2.
Using the inequalities in Lemma 3, under either the null hypothesis or the local alternative, we have
(19)
(20)
Thus v1 = o(n−1 v2).
Define
(21)
where δβ = β − β0. Then we can get
We have
which can be regarded as a U-statistic with the kernel
Through direct calculation, we can get the projections of Ψ,
,
, and
. By Hoeffding’s variance decomposition, we have
,
(see Supplementary Material B in S1 File).
Because
we only need to show that
From Eq (21) and the form of
, let
, where
and
Under the assumptions of this theorem and following Eqs (19) and (20) we have
and
To complete the proof, we now need to show
Define
and
, thus
, which we define as
. Let
be a σ−field generated by
. It is obvious to see that
,
. Then it shows that
is a zero mean martingale. Let
,
. The central limit theorem will hold Hall 28 if we can show
satisfies the following two conditions:
(22)
and for ∀τ > 0
We have
and
Hence we can define
, where
It can be shown that E[Cn1] = 1, and
As tr(Σ4) = O(tr2(Σ2)), and var[Cn1 → 0. Then
(see Supplementary Material B in S1 File). Similarly, E[Cn2] = 0,
, then
In summary, Eq (22) holds.
Since , by the law of large numbers, the last step is to prove
(23)
We have
, and
, thus under Assumption (16), we have (see details in S1 File)
Hence we prove that Eq (23) holds. And this completes the proof.
References
- 1.
Ramsay JO and Silverman BW. Functional Data Analysis. New York: Springer, 2005.
- 2. Yao F, Müller HG, and Wang J. Functional linear regression analysis for longitudinal data. The Annals of Statistics. 2005;33(6):2873–2903.
- 3. Malfait N and Ramsay JO. The historical functional linear model. The Canadian Journal of Statistics. 2008 Jun;31(2):115–128.
- 4. Chiou JM, Yang Y, and Chen Y. Multivariate functional linear regression and prediction. Journal of Multivariate Analysis. 2016;146:301–312, April.
- 5. Kokoszka P, Maslova I, Sojka J, Zhu L. Testing for lack of dependence in the functional linear model. Canadian Journal of Statistics. 2008 Jun;36(2):207–222.
- 6. Cardo H, Ferraty F, Mas A, and Sarda P. Testing hypotheses in the functional linear model. Scandinavian Journal of Statistics. 2016 Mar;30(1):241–255.
- 7. Kong D, Staicu AM, Maity A. Classical testing in functional linear models. Journal of Nonparametric Statistics. 2016;28(4):813–838. pmid:28955155
- 8. Su Y, Di C, and Hsu L. Hypothesis testing in functional linear models. Biometrics. 2017 Jun;73(2):551–561. pmid:28295175
- 9. Bai Z, Saranadasa H. Effect of high dimension: by an example of a two sample problem. Statistica Sinica. 1996;6:311–329.
- 10. Zhong P and Chen S. Tests for high-dimensional regression coefficients with factorial designs. Journal of the American Statistical Association. 2011 Jan;106(493):260–274.
- 11. Delsol L, Ferraty F, and Vieu P. Structural test in regression on functional variables. Journal of Multivariate Analysis. 2011 Mar;102(3):422–447.
- 12. Patilea V, Sellero CS, and Saumard M. Testing the predictor effect on a functional response. Journal of the American Statistical Association. 2016 Jul;111(516):1684–1695.
- 13. Nadaraya EA. On estimating regression. Theory of Probability & Its Applications. 1964; 9(1):141–142.
- 14. Watson GS, Smooth regression analysis. Sankhya: The Indian Journal of Statistics, Series A(1961-2002).1964;26:359–372.
- 15. Delsol L. No effect tests in regression on functional variable and some applications to spectrometric studies. Computational Statistics. 2013;28(4):1775–1811.
- 16.
Horváth L and Kokoszka P. Inference for Functional Data with Applications. New York: Springer, April 2012.
- 17. Shin H and Lee MH. On prediction rate in partial functional linear regression. Journal of Multivariate Analysis. 2012 Jan;103(1)93–106.
- 18. Ramsay JO and Dalzell CJ. Some tools for functional data analysis. Journal of the Royal Statistical Society. Series B (Methodological). 1991; 53(3):539–572.
- 19. Hervé C, Ferraty F, and Sarda P. Spline estimators for the functional linear model. Statistica Sinica. 2003 Jul;13:571–591.
- 20. Yao F, Müller HG, and Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100(470):577–590.
- 21. Yao F, Lei E, Wu Y. Effective dimension reduction for sparse functional data. Biometrika. 2015 Jun;102(2):421–437. pmid:26566293
- 22. Fan Y, James GM, and Radchenko P. Functional additive regression. Ann. Statist. 2015 Oct;43(5):2296–2325.
- 23. Hoeffding W. A Class of Statistics with Asymptotically Normal Distribution. The Annals of Mathematical Statistics. 1948 Sep;19:293–325.
- 24.
Serfling RJ. Approximation Theorems of Mathematical Statistics. New York: Wiley.;1980.
- 25.
Lee AJ. U-statistics: Theory and Practice. New York: Marcel Dekker, 1990.
- 26. Chen S. X. and Qin Y.-L. A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics 2010; 38(2):808–835.
- 27.
Bellman R. Some Inequalities for Positive Definite Matrices. Springer, 1980.
- 28.
Hall P and Heyde CC. Martingale Limit Theory and Its Application. New York: Academic Press, 1980.