Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics | PLOS Computational Biology

Advertisement

< Back to Article

Fig 1 — Fig 1.

Single imputation strategy.
(1) Initial dataset with missing values. It is supposed to be made of N observations that are split into K groups. (2) Single imputation provides an imputed dataset. (3) The vector of parameters of interest is estimated based on the single imputed dataset.

More »

Fig 2 — Fig 2.

Multiple imputation strategy.
(1) Initial dataset with missing values. It is supposed to have N observations that are split into K groups. (2) Multiple imputation provides D estimators for the vector of parameters of interest. (3a) The D estimators are combined using the first Rubin’s rule to get the combined estimator. (3b) The estimator of the variance-covariance matrix of the combined estimator is provided by the second Rubin’s rule.

More »

Table 1 — Table 1.

Overview of the imputation methods considered in this work.

More »

Fig 3 — Fig 3.

Workflow conducted for performance evaluation of the mi4p methodology and comparison to the one implemented in the DAPAR R package.

More »

Fig 4 — Fig 4.

Distribution of empirical errors for the five imputation methods considered on the second set of MAR simulations.

More »

Fig 5 — Fig 5.

Distribution of errors of the averaged imputed values for the five imputation methods considered on the second set of MAR simulations.

More »

Fig 6 — Fig 6.

Distributions of duration of the imputation process for the five imputation methods considered on the second set of MAR simulations.

More »

Table 2 — Table 2.

Number of pathological cases for each missing value proportion in the second set of MAR simulations.

More »

Fig 7 — Fig 7.

Distributions of differences in sensitivity, specificity, precision, F-score and Matthews correlation coefficient for the first MAR set of simulations.
Missing values were imputed using the maximum likelihood estimation method.

More »

Fig 8 — Fig 8.

Distributions of differences in sensitivity, specificity, precision, F-score and Matthews correlation coefficient for the second MAR set of simulations.
Missing values were imputed using the maximum likelihood estimation method.

More »

Fig 9 — Fig 9.

Distributions of differences in sensitivity, specificity, precision, F-score and Matthews correlation coefficient for the third MAR set of simulations.
Missing values were imputed using the maximum likelihood estimation method.

More »

Fig 10 — Fig 10.

Distributions of differences in sensitivity, specificity, precision, F-score and Matthews correlation coefficient for the first MCAR + MNAR set of simulations.
Missing values were imputed using the maximum likelihood estimation method.

More »

Fig 11 — Fig 11.

Distributions of differences in sensitivity, specificity, precision, F-score and Matthews correlation coefficient for the second MCAR + MNAR set of simulations.
Missing values were imputed using the maximum likelihood estimation method.

More »

Table 3 — Table 3.

Performance of the mi4p methodology expressed in percentage with respect to DAPAR workflow, on Saccharomyces cerevisiae + UPS1 experiment, with Match Between Runs and at least 1 out of 3 quantified values in each condition.
Missing values (6%) were imputed using the maximum likelihood estimation method.

More »

Table 4 — Table 4.

Performance of the mi4p methodology expressed in percentage with respect to DAPAR workflow, on Arabidopsis thaliana + UPS1 experiment, with at least 1 out of 3 quantified values in each condition.
Missing values (6%) were imputed using the maximum likelihood estimation method.

More »

Table 5 — Table 5.

Performance of the mi4p methodology (with the aggregation step) expressed in percentage with respect to DAPAR workflow, on Saccharomyces cerevisiae + UPS1 experiment, with at least 1 out of 3 quantified values in each condition.
Missing values were imputed using the Maximum Likelihood Estimation method.

More »

Table 6 — Table 6.

Performance of the mi4p methodology (with the aggregation step) expressed in percentage with respect to DAPAR workflow, on Arabidopsis thaliana + UPS1 experiment, with at least 1 out of 3 quantified values in each condition.
Missing values were imputed using the Maximum Likelihood Estimation method.

More »