Table 1.
Parameters for true data generating processes (DGP) and outcome generating models (OGM).
In all scenarios, the true vector of coefficients is equal to and the error distribution is set to ε ∼ N(0, 0.32). 0p denotes the p-dimensional vector of zeros.
Table 2.
Deviations from true DGP and OGM for parametric and Plasmode simulation.
Fig 1.
Relative error in MSE estimation for individual coefficients for different types of Plasmode simulation compared to parametric simulation under assumption of true DGP and OGM.
Fig 2.
Absolute value of the relative error in MSE estimation averaged over individual coefficients, for different types of Plasmode simulation compared to parametric simulation under the assumption of the true DGP and OGM, for p = 2, n = 100, β = (1, 1, 1)T, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
Fig 3.
Absolute value of relative error in MSE estimation for individual coefficients when the assumed feature distribution in parametric simulation deviates from the true distribution, for p = 2, n = 100, β = (1, 1, 1)T, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
Fig 4.
Absolute value of relative error in the MSE estimation averaged over individual coefficients for different types of Plasmode simulation compared to parametric simulation, under the assumption of the true data generating process and outcome generating model, for p = 50, n = 100, β = 151, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
Fig 5.
Relative error in MSE estimation for individual coefficients when the assumed mean of the marginal distribution of the second feature in parametric simulation deviates from the true mean, for p = 2, n = 100, β = (1, 1, 1)T, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
N(0,1), N(μ,1) denotes that the first feature is generated from a standard normal (truth), and the second feature is generated from a normal distribution with mean μ instead (deviation).
Fig 6.
Relative error in MSE estimation for individual coefficients when the assumed variance of the marginal distribution of the second feature in parametric simulation deviates from the true variance, for p = 2, n = 100, β = (1, 1, 1)T, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
N(0,1), N(0,σ2) denotes that the first feature is generated from a standard normal (truth), and the second feature is generated from a normal distribution with variance σ2 instead (deviation).
Fig 7.
Relative error in MSE estimation for individual coefficients when the assumed correlation of the features in parametric simulation deviates from true correlation, for p = 2, n = 100, β = (1, 1, 1)T, σ = 0.3, Cor(Xi, Xj) = 0.5 ∀i ≠ j.
Fig 8.
Relative error in MSE estimation for individual coefficients when the assumed correlation of the features in parametric simulation deviates from true correlation, for p = 2, n = 100, β = (1, 1, 1)T, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
Fig 9.
Relative error in MSE estimation for individual coefficients when the assumed correlation of the features in parametric simulation deviates from true correlation, for p = 2, n = 100, β = (1, 1, 1)T, σ = 0.3, Cor(Xi, Xj) = 0.2|i−j| for ith and jth feature within each of the 5 blocks.
Fig 10.
Relative error in MSE estimation for individual coefficients when the assumed marginal distribution of the second feature in parametric simulation is misspecified as Gaussian mixture with increasing proportion of data drawn from Gaussian with different expectations (bimodal distribution).
The mean and the variance of the marginal normal distribution of the first feature are set to match those of the second. The mixing proportion is given on the x-axis.
Fig 11.
Relative error in MSE estimation for individual coefficients when the assumed marginal distribution of the second feature in parametric simulation is misspecified as Gaussian mixture with increasing proportion of data drawn from Gaussian with different variance (contaminated distribution), for p = 2, n = 100, β = (1, 1, 1)T, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
The mean and the variance of the marginal normal distribution of the first feature are set to match those of the second. The mixing proportion is given on the x-axis.
Fig 12.
Relative error in MSE estimation for individual coefficients when the assumed marginal distribution of the second feature in parametric simulation is misspecified as log-normal, for p = 2, n = 100, β = (1, 1, 1)T, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
The mean and the variance of the marginal normal distribution of the first feature are set to match those of the second.
Fig 13.
Relative error in MSE estimation for individual coefficients when the assumed marginal distribution of the second feature in parametric simulation is misspecified as Bernoulli with different success probabilities, for p = 2, n = 100, β = (1, 1, 1)T, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
Fig 14.
Absolute value of relative error in MSE estimation averaged over individual coefficients when the assumed coefficients in parametric and Plasmode simulation are misspecified, for p = 50, n = 100, β = 151, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j, βI = (0, 0.02, …, 1)T, βII = 0.0551, βIII = 1051, βIV = 051.
Large outliers for n out of n Bootstrap are not displayed.
Fig 15.
Absolute value of relative error in MSE estimation averaged over individual coefficients when the assumed error variance in parametric and Plasmode simulation are misspecified for p = 50, n = 100, β = 151, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
Large outliers for n out of n Bootstrap are not displayed.
Fig 16.
Absolute value of relative error in MSE estimation averaged over individual coefficients when the assumed error distributions in parametric and Plasmode simulation are misspecified, for p = 50, n = 100, β = 151, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
Large outliers for n out of n Bootstrap are not displayed.
Fig 17.
Absolute value of relative error in MSE estimation for individual coefficients when the assumed feature correlation matrix in parametric simulation is misspecified.
True correlation matrix is estimated from the benchmark dataset quake (p = 3, n = 100, β = 14, σ = 0.3).
Fig 18.
Absolute value of relative error in MSE estimation for individual coefficients when the assumed feature correlation matrix in parametric simulation is misspecified.
True correlation matrix is estimated from benchmark dataset wine_quality (p = 11, n = 100, β = 112, σ = 0.3).
Fig 19.
Absolute value of relative error in MSE estimation averaged over individual coefficients when the assumed feature correlation matrix in parametric simulation is misspecified.
True correlation matrix is estimated from benchmark dataset Yolanda (p = 100, n = 200, β = 1101, σ = 0.3).
Fig 20.
Comparison of different resampling types for different numbers of observations resampled from a dataset with 100 observations.
Absolute value of relative error in MSE estimation averaged over individual coefficients when the true model is assumed in parametric and Plasmode simulation, for p = 10, n = 100, β = 111, σ = 0.3, Cor(Xi, Xj) = 0.2 ∀i ≠ j.
Table 3.
Smallest deviations in parametric simulations for which Plasmode simulation is superior to parametric simulation.
p denotes the number of features, n the number of observations. True ρ gives the true correlation structure, scenario type the type of deviation and true value the true parameter value that the deviation refers to.