Fig 1.
(1) Initial dataset with missing values. It is supposed to be made of N observations that are split into K groups. (2) Single imputation provides an imputed dataset. (3) The vector of parameters of interest is estimated based on the single imputed dataset.
Fig 2.
(1) Initial dataset with missing values. It is supposed to have N observations that are split into K groups. (2) Multiple imputation provides D estimators for the vector of parameters of interest. (3a) The D estimators are combined using the first Rubin’s rule to get the combined estimator. (3b) The estimator of the variance-covariance matrix of the combined estimator is provided by the second Rubin’s rule.
Table 1.
Overview of the imputation methods considered in this work.
Fig 3.
Workflow conducted for performance evaluation of the mi4p methodology and comparison to the one implemented in the DAPAR R package.
Fig 4.
Distribution of empirical errors for the five imputation methods considered on the second set of MAR simulations.
Fig 5.
Distribution of errors of the averaged imputed values for the five imputation methods considered on the second set of MAR simulations.
Fig 6.
Distributions of duration of the imputation process for the five imputation methods considered on the second set of MAR simulations.
Table 2.
Number of pathological cases for each missing value proportion in the second set of MAR simulations.
Fig 7.
Distributions of differences in sensitivity, specificity, precision, F-score and Matthews correlation coefficient for the first MAR set of simulations.
Missing values were imputed using the maximum likelihood estimation method.
Fig 8.
Distributions of differences in sensitivity, specificity, precision, F-score and Matthews correlation coefficient for the second MAR set of simulations.
Missing values were imputed using the maximum likelihood estimation method.
Fig 9.
Distributions of differences in sensitivity, specificity, precision, F-score and Matthews correlation coefficient for the third MAR set of simulations.
Missing values were imputed using the maximum likelihood estimation method.
Fig 10.
Distributions of differences in sensitivity, specificity, precision, F-score and Matthews correlation coefficient for the first MCAR + MNAR set of simulations.
Missing values were imputed using the maximum likelihood estimation method.
Fig 11.
Distributions of differences in sensitivity, specificity, precision, F-score and Matthews correlation coefficient for the second MCAR + MNAR set of simulations.
Missing values were imputed using the maximum likelihood estimation method.
Table 3.
Performance of the mi4p methodology expressed in percentage with respect to DAPAR workflow, on Saccharomyces cerevisiae + UPS1 experiment, with Match Between Runs and at least 1 out of 3 quantified values in each condition.
Missing values (6%) were imputed using the maximum likelihood estimation method.
Table 4.
Performance of the mi4p methodology expressed in percentage with respect to DAPAR workflow, on Arabidopsis thaliana + UPS1 experiment, with at least 1 out of 3 quantified values in each condition.
Missing values (6%) were imputed using the maximum likelihood estimation method.
Table 5.
Performance of the mi4p methodology (with the aggregation step) expressed in percentage with respect to DAPAR workflow, on Saccharomyces cerevisiae + UPS1 experiment, with at least 1 out of 3 quantified values in each condition.
Missing values were imputed using the Maximum Likelihood Estimation method.
Table 6.
Performance of the mi4p methodology (with the aggregation step) expressed in percentage with respect to DAPAR workflow, on Arabidopsis thaliana + UPS1 experiment, with at least 1 out of 3 quantified values in each condition.
Missing values were imputed using the Maximum Likelihood Estimation method.