Fig 1.
Prediction r2 values for simulation 1.
Error bars represent standard deviation for the r2 value across 20 replications.
Fig 2.
Prediction r2 values for simulation 2.
Error bars represent standard deviation for the r2 value across 20 replications.
Fig 3.
Prediction r2 values for simulation 3.
Error bars represent standard deviation for the r2 value across 20 replications.
Fig 4.
Predictive r2 on out-of-sample data for TlpSum and LassoSum for each of the 100 replications at each of the four simulation settings.
Lines are at a 45 degree angle through the origin, and not a line of best fit. Points below the line indicate better performance of TlpSum.
Fig 5.
Predictive r2 on out-of-sample data for TlpSum and ElastSum for each of the 100 replications at each of the four simulation settings.
Lines are at a 45 degree angle through the origin, and not a line of best fit. Points below the line indicate better performance of TlpSum.
Fig 6.
Number of nonzero effect sizes estimated by the three penalized regression methods as compared to the true number of nonzero effects, for the three sparse simulation settings.
Fig 7.
Number of true positives for the three penalized regression methods in the three sparse simulation settings.
Fig 8.
Precision of estimated nonzero effect sizes for the penalized regression methods applied to the three sparse simulation settings.
Fig 9.
Performance of the seven different model selection methods applied to a set of candidate LassoSum models.
Performance is measured by r2 on the testing data (the right bar in each group), and by squared quasi-correlation on the testing data (the left bar in each group). Error bars represent the standard deviation across 20 replications.
Fig 10.
Number of estimated nonzero effects for each model selection method across each of the simulation settings in simulation 1.
Models were selected from a set of candidate LassoSum models.
Fig 11.
Performance of the selected models for each of the model selection methods across the different simulation settings of simulation 1, as measured by precision, recall, and F1 score.
The leftmost box in each grouping of three corresponds to pseudo AIC, the center corresponds to pseudo BIC, and the rightmost corresponds to pseudovalidation. Models were selected from a set of candidate LassoSum models.
Table 1.
Median sample size for each study in the lipid analysis.
Table 2.
Model performance, as measured by quasi-correlation of the model predicted into the BioBank data, for each model selection method.
Models were estimated via TlpSum on the Teslovich data.