PERT: A Method for Expression Deconvolution of Human Blood Samples from Varied Microenvironmental and Developmental Conditions

doi:10.1371/journal.pcbi.1002838

Figure 1.

Schematic of deconvolution models.

(A) Generation of mixed profiles from heterogeneous samples. (A-i) represents a heterogeneous sample whose composition is unknown. Each bar in (A-ii) represents individual gene expression levels of the heterogeneous sample. (B) Schematic of four deconvolution models. (B-i) The non-negative least squares model (NNLS) (Lawson and Hanson (1995)) and the non-negative maximum likelihood model (NNML) predict proportions of pre-specified reference populations in a heterogeneous sample using mixed and reference profiles. (B-ii) The non-negative maximum likelihood new population model (NNML_np) estimates the gene expression profile of a new reference population that may exist in a heterogeneous sample; simultaneously, the model predicts proportions of both input reference populations and the new reference population. (B-iii) The perturbation model (PERT) perturbs the input reference profiles using a genome-wide perturbation vector ρ; simultaneously, the model predicts proportions of the reference populations in a heterogeneous sample. Parameters shown in red are model predicted.

More »

Expand

Figure 2.

NNML recovers known compositions of immune cell line mixtures.

Microarray data of IM-9 (○), Jurkat (▵), Raji (□), THP-1 (+), and the mixtures of these four cell lines in known proportions were obtained from Abbas et al. (2009). Proportions of each cell line were predicted using (A) NNLS with cell line signature probes (reproduced from Abbas et al. (2009)), (B) NNLS without cell line signature probe, (C) NNML with cell line signature probes, and (D) NNLS without cell line signature probes. Model predictions were compared with the input proportions used to create the mixtures. Cell line signature probes were obtained from Abbas et al. (2009).

More »

Expand

Figure 3.

PERT captures cell culture effects.

(A) Experimental setup for profiling genome-wide transcriptome expression of uncultured (day-0) and culture-derived (day-4) colony forming unit-monocytes (CFU-M) and megakaryocytes (MEGA). Lin-: lineage-depleted cells; TPO: thrombopoietin; SCF: stem cell factor; FLT3LG: fms-related tyrosine kinase 3 ligand. (B) Pearson's correlation comparison between day-0 and day-4 samples. (C) Plots of Gene Ontology enrichment analysis showing the enrichment scores of cell cycle phase genes, immune response genes, and inflammatory response genes by day-4 samples compared with day-0 samples. NES denotes the normalized enrichment score. P-values (P) were calculated using the hypergeometric test. (D) Pearson's correlation comparison between day-0 CFU-M, day-4 CFU-M, and perturbed day-0 CFU-M (or model predicted day-4 CFU-M) gene expression profiles. (E) Pearson's correlation comparison between day-0 megakaryocyte, day-4 megakaryocyte, and perturbed day-0 megakaryocyte (or model predicted day-4 megakaryocyte) gene expression profiles.

More »

Expand

Figure 4.

PERT recovers compositions of uncultured human cord blood mono-nucleated and lineage-depleted (Lin-) cells.

(A) Schematic compositions of mono-nucleated cell samples and Lin- cell samples. (B) Model predicted proportions of 11 homogeneous blood cell lineages, namely granulocytes (GRAN), erythrocytes (ERY), monocytes (MONO), precursor B cells (PREB), megakaryocyte-erythrocyte progenitors (MEP), megakaryocytes (MEGA), primitive progenitor cells (PPC), eosinophils (EOS), granulocyte-monocyte progenitors (GMP), common myeloid progenitors (CMP), and basophils (BASO) in uncultured human mono-nucleated cord blood cell samples. (C) Flow cytometry measured proportions of the 11 blood cell lineages in the uncultured human mono-nucleated cord blood cell samples shown in (B). (D) Model predicted proportions in uncultured human Lin- cord blood cell samples. (E) Flow cytometry measured proportions in the uncultured human Lin- cord blood cell samples shown in (D). (F) R² calculated from the Pearson's correlation coefficients between the model predicted cell proportions and the ones assigned by flow cytometry. See Table 2 for the associated t-statistics and P-values. (G) Averaged absolute differences of model predicted cell proportions. Error bars show standard deviations of the absolute differences between model predicted and flow cytometry assigned proportions of the 11 blood cell lineages. (H) The Bayesian information criterion (BIC) calculated from the parameters in Table 1.

More »

Expand

Table 1.

Parameters of NNML, NNML_np and PERT for the Bayesian information criterion (BIC) calculations shown in Figure 4H and Figure 5F.

More »

Expand

Table 2.

Associated statistics for the Pearson's correlation analysis between the model predicted and flow cytometry assigned cell proportions for uncultured mono-nucleated and lineage-depleted cell samples enriched from fresh human umbilical cord blood.

More »

Expand

Figure 5.

PERT recovers compositions of culture-derived lineage-depleted (Lin-) human blood cells.

(A) Schematic of experiment setup. (B) Model predicted cell proportions of 11 blood cell lineages (defined in Figure 4) in day-4 Lin- human blood cell samples. (C) Flow cytometry assigned averaged cell proportions (N = 3) in the day-4 Lin- human blood cell samples shown in (B). (D) R² calculated from the Pearson's correlation coefficients between the model predicted cell proportions and the ones assigned by flow cytometry. (E) Averaged absolute differences of model predicted cell proportions. Error bars show standard deviations of the absolute differences of the 11 blood cell lineages. (F) The Bayesian information criterion (BIC) calculated from the parameters in Table 1.

More »

Expand