The PIT-trap—A “model-free” bootstrap procedure for inference about regression models with discrete, multivariate responses

Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.


S1 FILE. PROOFS OF THEOREMS
We will first prove two lemmas relating to the situation where the cumulative distribution used for the PIT-trap has been misspecified, but in such a way that PITresiduals remain identically distributed.
Lemma 1. Let U = F (Y )Q + F (Y − )(1 − Q) be the probability integral transform but where the cumulative distribution F (y) may have been misspecified, and the true distribution function is G(y) = h {F (y)} for some function h(·).

Then
Proof. For simplicity we will consider the continuous case only, the proof follows via a similar method in the discrete case. Let u = F (y) be the observed value of the probability integral transform residual. Then: Lemma 1 is used directly in the proof of Lemma 2 below.
Lemma 2. Consider a set of n random variables Y 1 , . . . , Y n with distribution function G i (y) for Y i . A PIT-trap sample Y * 1 , . . . , Y * n is computed using a (possibly misspecified) set of cumulative distributions, denoted F i (y) for Y i .

1
If G i (y) = h {F i (y)} for some function h(·), then for each i: since the bootstrap sample U * i is drawn at random with replacement from the set of observed PIT-residuals.
Lemma 2 shows that if probability integral transform residuals are identically distributed, then the Y * i preserve the marginal distribution of the Y i .

Proof of Theorem 1
We will prove Theorem 1 by showing that asymptotically, the conditions of Lemma 2 are satisfied.
Hence, up to a term O p (n −1/2 ), F j (y; θ, x i ) satisfies the conditions of Lemma 2 (where h(·) is the identity function). By Lemma 2, PIT-trap values follow the true cumulative distribution function F j (y; θ, x i ), up to a term no larger than O(n −1/2 ).
Note: While this argument uses the result that the F j (y; θ, x i ) approximate the true distribution F j (y; θ, x i ), we can relax this assumption along the lines of Lemma 1 such that there is only the requirement that the PIT-residuals are (asymptotically) identically distributed, P (U ij ≤ u) = h(u) for each (i, j). Thus the PIT-trap can preserve the marginal distribution of the data under certain forms of model misspecification.

Proof of Theorem 3
The proof follows via the usual Edgeworth expansion approach in ?.
If T = g(Y) admits an Edgeworth expansion then: where p 1 (t) is an odd polynomial function of the skewness of T , p 2 (t) is an even polynomial function of the skewness and kurtosis of T , and these moments are evaluated with respect to the distribution of the matrix of data y, which is characterized by its margins F (y; θ, x i ), and the correlation between PIT-residuals var(U i ) = Σ.
If Y is discrete then the same type of expansion applies, but only at continuitycorrected points and not at all t (?).
Under the same assumptions, the distribution of the PIT-trap statistic T * = g(Y * ) under resampling admits a similar Edgeworth expansion: wherep 1 (t) andp 2 (t) are evaluated with respect to PIT-trapped data Y * whose marginal distribution is F (y; θ, x i ), where the correlation between PIT-trapped residuals is var * (U i ) = Σ.
Now from Theorem 1, the cumulative distribution function of a PIT-trap value Y * ij is F (y; θ j , x i )+O p (n −1/2 ), and from Theorem 1, var(U * i ) = Σ whose entries differ from 3 those of Σ by O p (n −1/2 ). Since F (y; θ j , x i ) and Σ characterize the joint distribution of the Y i ,p k (t) = p k (t) + O p (n −1/2 ) for any k for which the kth moment of Y ij is defined.
Hence the coefficients of n −1/2 in the above two Edgeworth expansions match to first order and P * (T * ≤ t) = P (T ≤ t) + O p (n −1 ) As in ?,p k (t) and p k (t) are odd functions for odd k. Hence the odd terms cancel when calculating a two-tailed probability, removing the coefficient of n −1/2 in each expansion, and the coefficients of n −1 match to first order, so