Estimating the local false discovery rate via a bootstrap solution to the reference class problem

Methods of estimating the local false discovery rate (LFDR) have been applied to different types of datasets such as high-throughput biological data, diffusion tensor imaging (DTI), and genome-wide association (GWA) studies. We present a model for LFDR estimation that incorporates a covariate into each test. Incorporating the covariates may improve the performance of testing procedures, because it contains additional information based on the biological context of the corresponding test. This method provides different estimates depending on a tuning parameter. We estimate the optimal value of that parameter by choosing the one that minimizes the estimated LFDR resulting from the bias and variance in a bootstrap approach. This estimation method is called an adaptive reference class (ARC) method. In this study, we consider the performance of ARC method under certain assumptions on the prior probability of each hypothesis test as a function of the covariate. We prove that, under these assumptions, the ARC method has a mean squared error asymptotically no greater than that of the other method where the entire set of hypotheses is used and assuming a large covariate effect. In addition, we conduct a simulation study to evaluate the performance of estimator associated with the ARC method for a finite number of hypotheses. Here, we apply the proposed method to coronary artery disease (CAD) data taken from a GWA study and diffusion tensor imaging (DTI) data.


Introduction
Methods of estimating the local false discovery rate (LFDR) [1], not suffering from the bias inherent in estimating other false discovery rates [2], have been applied to various datasets such as high-throughput biological data (e.g., gene expression, proteomics, and metabolomics), diffusion tensor imaging (DTI), and genome-wide association (GWA) study [3][4][5]. As an example, in a GWA study, the methods of estimating the LFDR are used in order to estimate the probability that a single nucleotide polymorphism (SNP) is associated with a disease a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 [6][7][8][9][10]. In addition, in DTI brain scans, the LFDR estimates have been used to estimate the proportion of dyslexic-non-dyslexic differences [5,11].
In many situations, the considered hypotheses are connected by a scientific context. However, ignorance of this scientific context in a data analysis can be misleading, because it may introduce bias into the LFDR estimates [11]. For example, each test in a GWA study corresponds to a specific genetic marker for which previous biological information may be available. Moreover, in a DTI study, each test corresponds to a voxel, where the voxel location can be incorporated as a scientific context.

Motivating example
We consider coronary artery disease (CAD) data [12], where N = 394, 839 SNPs passed the quality control (QC) filtering methods explained in Section 3.2.1. The aim of this study is to identify whether each SNP is associated with a disease. We have two components (z i , x i ) associated with each hypothesis for i = 1, . . ., N, where z i is an observed test statistic and x i presents the minor allele frequency (MAF). For our data, all observed test statistics are used to identify the disease-associated SNPs. Fig 1(A) the N = 394, 839 SNPs used to estimate the LFDR (Section 2). A total of 44 disease-associated SNPs with LFDR estimates lower than 0.2 are identified.
A set of hypotheses or features used to determine the posterior probability of a null hypothesis is called a reference class, and the problem of finding such a set is an example of the reference class problem (e.g., Bickel [13]). For example, considering all SNPs when estimating the LFDR for a specific MAF, instead of considering the different subsets of SNPs, is called the combined reference class (CRC) method (Section 2). From Fig 1(A), consider the example of x i = 0.0306 and z i = −4.3971, with an estimated LFDR of 0.1973 that is close to the threshold of 0.2. For this MAF, we define a reference class of SNPs in such a way that the MAFs are within a symmetric window around x i = 0.0306, with width 2Δ. Different window widths yield different reference classes. Again, each subset of SNPs is used to estimate the LFDR. Fig 1(B) illustrates the LFDR estimates versus the reference class width. This figure shows that changing the reference class width provides different LFDR estimates, raising the important question of how we can estimate the optimal reference class in order to estimate the LFDR. For the considered CAD data, the reference class problem consists of deciding which SNPs should be used to determine whether a SNP is associated with a disease.
The hypotheses can be divided into groups based on the characteristics of the problem. For example, the CAD data can be divided into two distinct groups according to MAFs, low-frequency SNPs (1% � x i � 5%) and common SNPs (x i > 5%). Thus, we need to determine which reference class should be used to determine the posterior probability that the SNP is not associated with the disease occurring at the MAF x i = 0.0306. Should we use the entire set of SNPs, or the low-frequency SNPs [14]? In addition, SNPs can be divided into different classes, such as non-synonymous SNPs, genic SNPs, SNPs in highly conserved regions, SNPs in linkage disequilibrium with many (or few) other SNPs, or categorized SNPs based on their MAFs [12].

Previous research on the reference class problem
Many methods have been proposed for incorporating covariates into statistical techniques for testing multiple hypotheses. Bickel [15] considered the effect of selecting test statistics in estimators of the weighted and unweighted FDR, and found that smaller reference classes of null hypotheses yield lower estimated expected losses than larger reference classes do. Several researchers have applied the idea of incorporating a group structure and weights to improve the statistical power of tests. This group structure can be used when testing multiple hypotheses by assigning weights for the hypotheses or the p-values in each group. Benjamini and Hochberg [16] used a p-value weighting method to evaluate different procedures. Genovese et al. [17] demonstrated that a p-value weighting procedure can be employed to control the FWER and FDR while increasing the statistical power of the test. Subsequently, Wasserman and Roeder [18] introduced an optimal p-value weighting procedure to control the FWER. Sun et al. [19] proposed a stratified false discovery control approach for genetic studies, in which a large number of hypotheses include inherent stratification. In addition, Efron [11] argued that analyzing separate reference classes can be legitimate from a frequentist viewpoint, and Hu et al. [20] proposed a weighting scheme based on a simple Bayesian framework that employs the proportion of null hypotheses that are true within each group. Such an approach can control the FDR for p-values with certain dependence structures. The unknown proportion of true null hypotheses is estimated within each group. Moreover, Zablocki et al. [8] used a hierarchical Bayesian approach to incorporate a set of covariates, where the prior probability that the null hypothesis test is true and the alternative distribution of the test statistic are both modulated by covariates. In contrast to Zablocki et al. [8], instead of specifying the hyperprior distributions required for a hierarchical Bayesian approach, we follow Karimnezhad and Bickel [21] and use an empirical Bayes approach to estimate an optimal reference class to improve the LFDR estimate.
In this study, we assume the prior probability to be a function of covariates (Section 2.1). Then, we propose an adaptive reference class (ARC) method for estimating the LFDR, using a bootstrap approach to estimate the optimal reference class (Section 2.2). We compare the performance of the proposed ARC method and the CRC method using the mean squared error (MSE) as the performance criterion. We prove that, under certain assumptions, the ARC method has an MSE that is asymptotically no greater than that of the CRC method (see S1 File). In addition to the asymptotic results, we conduct a simulation study to investigate the finite dataset performances of the LFDR estimators for each method in Section 3.1. Both the The local false discovery rate estimated via a bootstrap solution to the reference class problem PLOS ONE | https://doi.org/10.1371/journal.pone.0206902 November 26, 2018 asymptotic and simulation results show that, under certain assumptions, the ARC method performs well compared with the CRC method. We present an application of the ARC method on both CAD data and DTI data in Section 3.2 in order to demonstrate the practical importance of deciding between the ARC and CRC methods. Finally, we conclude the paper with a brief discussion in Section 4. The proof for the main theorem is included in S1 File.

Materials and methods
Suppose N null hypotheses H 01 ,. . ., H 0N are considered simultaneously. For example in GWA study, let H 0i denote the null hypothesis that the i th SNP is not associated with the disease. Under the genetic additive model [22], each SNP yields a Wald χ 2 test statistic W i . Under the i th null hypothesis, it holds that W i � w 2 1 , while under the i th alternative hypothesis, we have W i � w 2 1;d , where δ 2 (0, 1) is an unknown noncentrality parameter, following the models employed in [23] and [24]. Under the i th null hypothesis, assume that Z i * N(0, 1), where Z i represents the z-transform that converts the Wald χ 2 statistic into a standard normal statistic. In addition, for DTI data, let H 0i denote the null hypothesis that there is no dyslexic-non-dyslexic difference for the i th voxel. Under the i th null hypothesis, assume that Z i * N(0, 1), where Z i represents the z-transform which converts two-sample t-test satistic into a standard normal statistic.
The observed statistics z = (z 1 ,. . ., z N ) T are considered realizations of Z = (Z 1 ,. . ., Z N ) T . Let A i be an indicator variable for the event that the i th alternative hypothesis H ai is true. Assume that A i 's are independent and identically distributed (i.i.d.) Bernoulli(1 − π 0 ) variables, where π 0 is the prior probability that the i th null hypothesis is true. Let f 0 (z i ) and f 1 (z i ) be the null and alternative density, respectively.
The posterior probability that the i th null hypothesis is true, given Z i = z i is the LFDR [1], and is denoted as C(z i ), where where f(z i ; π 0 ) denotes the mixture density of Z i given by If the null hypothesis is true, then the null density f 0 (z i ) of the statistic Z i is the standard normal, and is called the theoretical null hypothesis [1]. The model defined in Eq (2) with the following method of estimation is called the histogram-based (HB) model [25]. By assuming the theoretical null hypothesis and applying the Poisson regression, the mixture density f(z i ;π 0 ) is estimated by fitting a high-degree polynomial to the histogram counts, denoted by b f i ðzÞ, where the estimate of the proportion of true null hypothesis is denoted by b p 0 ðzÞ. The LFDR estimate b C i ðzÞ is computed by substituting b p 0 ðzÞ and b f i ðzÞ into Eq (1). This model and estimation method are an example of the CRC method.

Proposed model
The described model in Eq (1) extends to the situation that incorporates a covariate related to the scientific context of each hypothesis test. For the CAD dataset, the covariate represents the MAF in the CAD data, while for the DTI data, the location is incorporated as a covariate. Let X = (X 1 ,. . ., X N ) T be i.i.d. random variables. Any test statistics are transformed to the standard normal statistic Z i , for i = 1,. . ., N. The observed statistics vector z = (z 1 ,. . ., z N ) T is considered a realization of Z = (Z 1 ,. . ., Z N ) T . Let A i be the event that the i th alternative hypothesis H ai is , where π 0 (x i ), the prior probability that the i th null hypothesis is true, is an unknown function of the given covariate X i = x i . We denote the posterior probability that the i th null hypothesis is true, given Z i = z i and X i = x i by where the mixture density of Z i conditional on the covariate X i = x i is given by where f 0 (z i ) denotes the null density of Z i and f 1 (z i ;x i ) is the alternative density of Z i . The mixture density in Eq (2) is a special form of Eq (4), where the effect of the covariates is ignored. The quantities π 0 (x i ) and f 1 (z i ;x i ) are unknown. The ARC method is applied to estimate the LFDR in Eq (3). Under the CRC method, the effects of the covariates defined in Eq (1) are ignored, while under the proposed ARC method, some assumptions are considered locally in order to estimate the LFDR C(z i ; x i ), defined in Eq (3).

Adaptive reference class (ARC) method
Under the ARC method, certain assumptions only hold locally within a symmetric window for each covariate. Let a symmetric window of width 2Δ be centered at given covariate Let Δ 0 denote the smallest considered value of the tuning parameter Δ. The reference class z D i contains components z j such that their covariates are within a distance Δ of x i . Denoting the expected dimension of the reference class Here, d D i increases with number of null hypothesis tests N, provided that the probability is positive. For each reference class z D i , we may apply any LFDR estimation approach such as the HB model in Section 2. In contrast to the CRC method, instead of using the entire collection of observed statistics z, only the reference class z D i is used to obtain the LFDR estimate b C i ðz D i Þ. The choice of tuning parameter Δ influences the LFDR estimate. Here, we choose the one that results in the lowest error when estimating the LFDR, which is called the optimal tuning parameter.

Optimal tuning parameter
The optimal tuning parameter Δ specifies the symmetric window width of a given reference class, and is determined by minimizing the errors resulting from the bias and the variance. In the following, we introduce several notational conventions.
Let the mean and variance of the estimator b respectively. When X i = x i , the prediction bias for the estimator b Determining the optimal choice of Δ depends on the choice of the loss function used to measure the errors in the estimation of the LFDR. A good estimator is accurate in the sense that its estimates are as close to the true values as possible. Accuracy measures typically take into account the difference between the estimated value and the true value. Using the MSE as a an accuracy measure is a commonly used way to indicate how close the estimator is to the true value by incorporating both the bias and the variance [26]. Hence, using the MSE provides our criterion for defining the optimal tuning parameter Δ. The MSE for the estimator b It can be shown that the portion of MSE that depends on Δ is given by Here, we employ the errors resulting from the bias and the variance in Eq (10) to determine the optimal Δ. Denoting the optimal Δ by D ? 0i , we have To estimate D ? 0i , it is necessary to estimate the variance and the prediction bias of the LFDR estimator, which we do using the bootstrap approach.

Bootstrap estimation of the optimal tuning parameter
The estimate of C(z i ; x i ) based on the b th bootstrap reference class is denoted by b Þ provide the estimators b mðD; BÞ and b s 2 ðD; BÞ, which we use to estimate μ Δ (x i ) and In order to estimate the prediction bias in Eq (10), we need to estimate π 0 (x i ). We propose using a reference class z The estimator of errð b (10) is denoted by c errðD; D 0 ; BÞ, and is computed by simply summing the bootstrap variance in Eq (13) and the squared bootstrap prediction bias in Eq (14). Let the optimal D ? 0i be denoted as b D ? 0i , which is given by b After estimating the optimal tuning parameter b D ? 0i (see Algorithm 1), the optimal reference i Þ is computed using the ARC method. We compare the performance of two estimators using the MSE.
Algorithm 1 Pseudo-code of the estimation of the optimal tuning parameter for one simulated dataset.  Let π 0 (x i ) denote the true prior probability that the i th null hypothesis is true. In GWA study, the null hypothesis means no disease association, and in the DTI study, it means no differences between dyslexic and non-dyslexic children. For a given x 0 , we suppose that the unknown prior probability π 0 (X i ) is a step function of the covariate X i , given by where the prior probabilities π 01 and π 02 are both unknown, and π 01 � π 02 . This function splits the N tests into two distinct groups, such that in each group, the test statistics are i.i.d. Moreover, the simplified function in Eq (16) will have a biologically meaningful interpretation. As an example, in CAD data, the N SNPs can be divided into two distinct groups; disease-associated and not disease-associated. The two groups may have different choices of prior probabilities. In addition, under the assigned values of x 0 and Δ 0 , the observed vector of covariates x may be partitioned into three regions for j = 1,.., N. Therefore, the following theorem ensures us that, under the assumptions explained above for π 0 (X i ) Eq (16) and the region of covariates Eq (17), the proposed ARC method has an MSE that is asymptotically no greater than that of the CRC method. The proof of the theorem is given as a series of lemmas (see S1 File).

Theorem 1 Let b C i ðzÞ be a weakly consistent estimator of C(z i ) when N becomes large. If
where B is the number of bootstrap samples.

Remark 1
The step function of the covariates in Eq (16) and the partitioning of the covariates in Eq (17) affect the weak consistency of b mðD 0 ; BÞ Eq (13) as an estimator of π 0 (X i ) in Eq (16) (see S1 File, Lemma 2). Then, for x i 2 R 2 ðx 0 ; D 0 Þ, the bootstrap mean b mðD 0 ; BÞ is not a weakly consistent estimator of π 0 (X i ).
Remark 2 For x i 2 R 3 ðx 0 ; D 0 Þ, similar results to Theorem 1 can be derived.

Simulation study
The aim of the simulation analysis presented here is to compare the finite dataset performances of the CRC and ARC methods when estimating the LFDR in Eq (3). In this section, each test statistic is assigned a prior probability that is a function of the covariate. We assume that the proportion of disease-associated tends to be very small. Then, we present several simulation studies, each with a different value of x 0 2 [0.05, 0.40]. The datasets are simulated as follows. In each simulation, we randomly generate 1000 datasets, each corresponding to an artificial case-control study. For each dataset, we simultaneously generate both the auxiliary information and the observed Wald χ 2 test statistics, denoted by x i and w i , respectively. Each observed covariate x i is generated randomly from the uniform distribution between 0 and 1.
In each simulation, the true prior probability π 0 (x i ) is determined according the given value of x 0 as a function of the observed covariate in Eq (16). From (16), let � p 0 ¼ Eðp 0 ðX i ÞÞ for i = 1,. . ., N, where � p 0 2 ½0:60; 0:95�. To generate the observed statistics, we generate each A i * Bernoulli(1 − π 0 (x i )) independently. To generate the observed χ 2 test statistics, if A i = 1, the observed statistics are sampled from w 2 1;d with a noncentrality parameter δ. For each given value of x 0 , a different value of δ 2 [1.5, 7] is assigned. The Wald χ 2 test statistics when A i = 0 are sampled from w 2 1 . The Wald χ 2 test statistics are then transformed into zvalues.
Each dataset has N pairs (z i , x i ). The total number of pairs N is equal to 300, 000. In each simulation, a pair (z i , x i ) is selected randomly from each dataset to estimate C(z i ; x i ). For a given covariate x i , the estimators of the LFDR are computed using the two methods. Under the ARC method, Δ 0 has to be specified in advance in order to determine b D ? 0i . We consider the range Δ 0 2 (0, x 0 ), and set B = 1000. Thus, under the HB model described in Section 1.1, we The conditional MSE approximations used to measure the performances of the estimators are given by The local false discovery rate estimated via a bootstrap solution to the reference class problem and the marginal MSE approximations are computed as follows: i Þ for the ARC method. The relative MSE of the two estimators is a convenient measure for comparing the MSEs. The conditional and marginal relative MSEs are denoted by respectively. From Fig 2, we observe that the performance of the ARC method depends on the Δ 0 values and the region of the covariates. When � p 0 2 ½0:60; 0:95�, increasing the value of Δ 0 results in a smaller MSE approximation for the ARC method in the regions R 1 ðx 0 ; D 0 Þ and R 3 ðx 0 ; D 0 Þ that follow the results in Theorem 1. Then, increasing � p 0 , Fig 2(D) shows that the MSE approximation for the proposed ARC method is greater than that of the CRC method in the region R 2 ðx 0 ; D 0 Þ, for some Δ 0 . Fig 2 shows that the ARC method has a smaller marginal MSE approximation than that of the CRC method. From Table 1, if we consider the true prior probabilities for all SNPs to be independent of the covariate, that is, π 0 (x i ) = π 0 for i = 1,. . ., N, then the CRC method has a smaller MSE than that of the ARC method. In such cases, the CRC method should be used instead of the ARC method to analyze the data.

Real data analysis
We apply both the ARC and CRC methods to the CAD and DTI datasets, and compare the disease-associated SNPs and dyslexic-non-dyslexic difference voxels under each method, respectively. The purpose of this comparison is to demonstrate the practical difference between the methods rather than to determine which method performs better.

CAD data analysis.
The CAD dataset originating from the United Kingdom includes 500, 568 SNPs genotyped for 2, 000 cases, and 3, 000 combined on 22 autosomal chromosomes. The control individuals come from two groups: 1500 individuals from the 1958 British Birth Cohort (58C), and 1500 individuals selected from UK blood services (UKBS) controls. Following [12], we use quality control filtering methods to exclude SNPs based on the exact Hardy-Weinberg equilibrium (HWE) test as well as individuals or SNPs with many missing genotypes. The following three filters are applied sequentially. For the first filter, an SNP fails when the proportion of missing data proportion is greater than 0.05 or when the minor allele frequency (MAF) is smaller than 0.05 and the missing data proportion is greater than 0.01. For the second filter, SNPs with p-values smaller than 0.05 using the exact HWE test in the combined controls are rejected. Finally, for the third filter, we reject SNPs with p-values smaller than 5 × 10 −7 using trend tests and general genotype tests between each case and the combined controls. We also excluded SNPs with MAFs smaller than 0.01. A total of N = 394839 SNPs passed these quality control filtering methods and are used to identify disease-associated SNPs. We apply both the ARC and the CRC methods to the CAD data and compare the disease-associated SNPs under each method.
The CAD related data introduced in Section 1.1, with N = 394, 839 SNPs, is employed in the following statistical analysis. Under the CRC method, all observed statistics z are considered to estimate the LFDR, where 44 disease-associated SNPs are identified with LFDR lower than 0.2. The MAF is incorporated as a covariate. Under the ARC method, the optimal reference class is estimated for each MAF, which depends on the choice of Δ 0 . Fig 3(A) presents the LFDR estimates under the CRC method versus the ARC method when Δ 0 = 0.001. The results show that, 160 SNPs are disease-associated based on the ARC method, while the CRC method detects 44 disease-associated SNPs. From Fig 3(B), we find that changing Δ 0 has a direct effect on the number of disease-associated SNPs. Under the ARC method, increasing the value of Δ 0 brings the proportion of disease-associated SNPs closer to the corresponding proportion under the CRC method. The local false discovery rate estimated via a bootstrap solution to the reference class problem

DTI data analysis.
Schwartzman et al. [5] used advanced MRI technology, DTI, to measure water diffusion in the human brain by scanning the brain. DTI is used to map and characterize the three-dimensional diffusion of a water molecule randomly moving in brain tissue to provide information regarding the direction of diffusion. The measured diffusivity; that is, the diffusion coefficient, relates the diffusive flux to a concentration gradient [27], and has units of (mm 2 /s). In this study, 12 children were tested and divided equally in each group (i.e., dyslexic or non-dyslexic group). Each child received DTI brain scans in N = 15443 locations, with each represented by its own voxel's response. The aim is to determine the dyslexicnon-dyslexic difference at the i th voxel (location), in relation to reading development in children aged 7-13 [28]. Each test corresponds to a specific voxel. We have two components (z i , x i ) associated with each hypothesis for i = 1,. . .,N, where z i is an observed test statistic that compares the dyslexic children with those who are not (see in Section 2), and x i is the location (i.e., the distance from back of brain to the front). We apply both the ARC and CRC methods to the DTI data and compare the dyslexic-non-dyslexic difference voxels under each method. The DTI brain scans data with a total of N = 15443 locations, each represented by its own voxel's response, is employed in the following statistical analysis. Let The observed statistics w = (w 1 , w 2 ,. . ., w N ) T are considered as a realization of Under the i th null hypothesis, it holds that W i * χ 1 , while under the i th alternative hypothesis W i * χ 1,δ , where δ 2 (0, inf) is an unknown noncentrality parameter, according to the models employed in [23,24]. Therefore, the LFDR in Eq (1) is defined as follows where g 0 (w i ) * χ 1 (i.e., null density), and g(w i ; π 0 , δ) denotes the mixture density given by The local false discovery rate estimated via a bootstrap solution to the reference class problem where g 1 (w i ; δ) represents the unknown alternative density. Under the CRC method, all observed statistics w are considered to estimate the LFDR using the type II maximum likelihood estimation (MLE) model [24] lðp 0 ; dÞ ¼ where b p 0 ¼ 0:923 and b d ¼ 3:7, and 119 dyslexic-non-dyslexic difference voxels are identified with LFDR lower than 0.2. Then, the brain location is incorporated as a covariate. Under the ARC method, the optimal reference class is estimated for each location, which depends on a choice of Δ 0 . Fig 4(A) presents the LFDR estimates under the CRC method versus the ARC method when Δ 0 = 20. We observe from Fig 4(B), that changing Δ 0 has a direct effect on the number of dyslexic-non-dyslexic difference voxels. Under the ARC method, increasing the value of Δ 0 brings the proportion of dyslexic-non-dyslexic difference voxels closer to the corresponding proportion under the CRC method.

Discussion and conclusion
In this study, we employ a novel approach that incorporates a covariate (i.e., a scientific context corresponding to each hypothesis test) to improve the LFDR estimate when identifying alternative hypotheses. Using this approach, both the test statistic distribution under the alternative hypothesis and the prior probability that the null hypothesis is true, are modulated by the covariate. In the case where the prior probability π 0 (X i ) is the step function given in Eq (16), Theorem 1 states that the ARC method has an MSE asymptotically no greater than that of the CRC method. It would be interesting to investigate whether this result holds for a general prior probability. We leave this topic for future research. In addition, the simulation indicates that the ARC method performs in comparison to the CRC method for a finite number of hypotheses. Our simulation results confirm that for regions R 1 ðx 0 ; D 0 Þ and R 3 ðx 0 ; D 0 Þ, the LFDR estimator associated with the ARC method has a smaller MSE approximation than that of the CRC method (see Fig 2). Moreover, we could not prove the weak consistency of b mðD 0 ; BÞ as an estimator of π 0 (x i ) in region R 2 ðx 0 ; D 0 Þ (see S1 File, Lemma 2). The ARC method was applied to both CAD and DTI datasets, as illustrated in Figs 3 and 4. Regardless of LFDR estimation methods (i.e., HB and MLE), by increasing the value of the tuning parameter Δ 0 , the proportion of significant null hypotheses decreases, and approaches the proportion based on the CRC method. This suggests that further investigation may be necessary on how the tuning parameter Δ 0 can be controlled to improve results.
Supporting information S1 File. The proof of Theorem 1 proceeds by a series of lemmas. (PDF)