Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples

doi:10.1371/journal.pone.0131765

Table 1.

Unequal inclusion probabilities for stratified sample.

More »

Expand

Fig 1.

Simulated population sample unit inclusion probabilities for stratified sample and inverse probability bootstrap (IPB) of stratified sample

More »

Expand

Table 2.

Variance-covariance matrix input to R function cluster.Gen to simulate correlated and clustered explanatory (X1. X6) and response (Y) ecological variables.

More »

Expand

Table 3.

Simulated population for example 2: distribution by stratum, and target sample size by stratum for stratified sample design.

More »

Expand

Table 4.

Bias and standard deviation of predicted values, by sampling design and modeling method, for simulated ecological data in example 2.

More »

Expand

Fig 2.

Estimated mean prediction errors, as a percent of the standard deviation of the response variable, with 95% confidence intervals, for linear regression of response variable on independent variables, and boosted regression tree anlaysis, using: simple random samples (SRS), stratified samples fit without accounting for design weights (Strat), and inverse probability bootstrap sampling (IPB)

More »

Expand

Fig 3.

Distribution of estimated slopes for linear regression, using simple random samples (SRS), stratified samples fit ignoring sample inclusion probabilities (Strat), and regression using Inverse Probability Bootstrap samples (IPB)

More »

Expand

Table 5.

Linear model regression results for simulated data.

Standard errors refer to the precision of the parameter estimates.

More »

Expand

Fig 4.

Distribution of estimated slopes for quantile regression, using simple random samples (SRS), stratified samples fit ignoring sample inclusion probabilities (Strat), and regression using Inverse Probability Bootstrap samples (IPB)

More »

Expand

Table 6.

Quantile regression results for simulated data.

Standard errors refer to the precision of the parameter estimates.

More »

Expand

Fig 5.

Estimated mean prediction error, as a percentage of the standard deviation of the response variable, for linear regression and boosted regression tree analysis, across a range of variability levels in sample inclusion probability, using simple random samples (SRS), stratified samples fit without accounting for design weights (Strat), and Inverse Probability Bootstrap sampling (IPB).

More »

Expand

Table 7.

Definition and measurement units for CHaMP habitat metrics used in example 3 (www.champmonitoring.org).

More »

Expand

Fig 6.

Mean and 95% confidence intervals for cross validation prediction error for regression of steelhead density on independent variables, and boosted regression tree analysis of steelhead density, as a percentage of the mean observed steelhead density at all sites.

Models are built on data from stratified sample ignoring sample inclusion probabilities (Srat), and Inverse Probability Bootstrap samples (IPB)

More »

Expand

Table 8.

Cross validation results for example 3: bias and standard deviation of predicted-measured ln(Steelhead per m²).

More »

Expand

Table 9.

Parameter estimates for example 3, regression of ln(steelhead density, fish/m²) on selected habitat parameters, for models that: ignore sample inclusion probabilities, and utilize IPB sampling to account for sample inclusion probabilities.

More »

Expand

Table 10.

Parameter estimates for example 3, comparison for 95^th percentile quantile regression of ln(steelhead density, fish/m²) on selected habitat parameters, for models that: ignore sample inclusion probabilities, and utilize IPB sampling to account for sample inclusion probabilities.

More »

Expand