Fig 1.
Crocs dataset, Anatomical bias.
Crocs dataset, Anatomical bias. A-C. Visualization of imputation of complete cases with simulated missing values for A. Probabilistic PCA (PPCA) imputation, B. Mean imputation, C. KNN imputation. In all three, blue represent complete cases. Simulated cases with missing that would be rejected (red) or accepted (green) for estimated error <0.03. The size represents the actual imputation error. D-F. Shows the imputation errors for all complete cases (rows) with all missingness patterns simulated (columns) for D. PPCA imputation, E. Mean imputation, F. KNN imputation. G-H. Feature weights in first and second principal components. I. Learning curve presenting root mean square error (RMSE) as a function of included cases with 100 replications at each step. RMSE was calculated both the estimated errors and the actual imputation errors.
Fig 2.
Crocs dataset, Species bias. A-C. Visualization of imputation of complete cases with simulated missing values for A. Probabilistic PCA (PPCA) imputation, B. Mean imputation, C. KNN imputation. In all three, blue represent complete cases. Simulated cases with missing that would be rejected (red) or accepted (green) for estimated error <0.03. The size represents the actual imputation error. D-F. Shows the imputation errors for all complete cases (rows) with all missingness patterns simulated (columns) for D. PPCA imputation, E. Mean imputation, F. KNN imputation. G-H. Feature weights in first and second principal components. I. Learning curve presenting root mean square error (RMSE) as a function of included cases with 100 replications at each step. RMSE was calculated both the estimated errors and the actual imputation errors.
Fig 3.
Gridsearch to estimate γ and δ for each dataset. Optimal parameters indicated with a red circle. A-C. Crocs dataset with anatomical bias imputed with PPCA (A), Mean imputation (B), or KNN imputation (C). D-E Crocs dataset with species bias imputed with PPCA (D), Mean imputation (E), or KNN imputation (F). G-I Echocardiogram dataset imputed with PPCA (G), Mean imputation (H), or KNN imputation (I). J-L Chronic kidney disease dataset imputed with PPCA (J), Mean imputation (K), or KNN imputation (L).
Fig 4.
Echocardiogram dataset. A-C. Visualization of imputation of complete cases with simulated missing values for A. Probabilistic PCA (PPCA) imputation, B. Mean imputation, C. KNN imputation. In all three, blue represent complete cases. Simulated cases with missing that would be rejected (red) or accepted (green) for estimated error <0.03. The size represents the actual imputation error. D-F. Shows the imputation errors for all complete cases (rows) with all missingness patterns simulated (columns) for D. PPCA imputation, E. Mean imputation, F. KNN imputation. G-H. Feature weights in first and second principal components. I. Learning curve presenting root mean square error (RMSE) as a function of included cases with 100 replications at each step. RMSE was calculated both the estimated errors and the actual imputation errors.
Fig 5.
Chronic kidney disease dataset.
Chronic kidney disease dataset dataset. A-C. Visualization of imputation of complete cases with simulated missing values for A. Probabilistic PCA (PPCA) imputation, B. Mean imputation, C. KNN imputation. In all three, blue represent complete cases. Simulated cases with missing that would be rejected (red) or accepted (green) for estimated error <0.03. The size represents the actual imputation error. D-F. Shows the imputation errors for all complete cases (rows) with all missingness patterns simulated (columns) for D. PPCA imputation, E. Mean imputation, F. KNN imputation. G-H. Feature weights in first and second principal components. I. Learning curve presenting root mean square error (RMSE) as a function of included cases with 100 replications at each step. RMSE was calculated both the estimated errors and the actual imputation errors.