Skip to main content
Advertisement

< Back to Article

Fig 1.

Overview of Pareto archetype analysis of single-cell datasets to discover polytopes in gene expression space and infer tasks.

Datasets from different human and mouse tissues analyzed by different groups with different technologies were analyzed by Pareto archetype analysis. Best fit polytopes and their significance were found. Tasks were inferred from the genes maximally enriched in the cells closest to each vertex of the polytope.

More »

Fig 1 Expand

Fig 2.

Human colon crypt cells fall in a tetrahedron in gene expression space.

(a) For k = 2–11 we found the k-polytope that best fit the data using PCHA algorithm, considering all 76 dimensions. Explained variance of best fit polytopes with k = 2–11 vertices begins to saturate at k = 4 or k = 5 vertices. (b) Comparison between the variance explained by the first k principal components of the data to the variance explained by the k principal components of shuffled data suggests that effective data dimensionality is three or four. Blue line: variance explained by PCA of intestinal data. Green line: variance explained by PCA of shuffled data. Points represent mean values. Error bars, representing 5%-95% variation intervals, are smaller than line width. Points for which the real data EV is higher than the randomized data EV are marked with *. (c) Data displayed in first 3 PCs axes resembles a tetrahedron, and its projections on principal planes (d)-(f) resemble triangles. Archetypes and their variation upon data resampling (bootstrapping) are shown as colored ellipses (see S1B Text). Thin lines—tetrahedron edges.

More »

Fig 2 Expand

Fig 3.

Expression profiles of the four colon crypt archetypes are each enriched for markers of specific cell types.

(a) The expression profiles of the four archetypes, with enriched genes colored. Enriched genes were determined by leave-1-out enrichment analysis, binning the cells according to distance from each archetype and seeking when average expression in the bin closest to the archetype is maximal, as described in Methods: 1D Gene enrichment at archetypes (See full enriched genes list in S2 Table). Light blue—enterocyte archetype, yellow—Nodal archetype, green—stem cells archetype, red—goblet cell archetype. Genes that are not enriched, or enriched in more than one archetype, are in dark blue. Zero level represents the average expression of each gene in the dataset. (b) Leave-1-out enrichment plot: expression of a gene (SLC26A3—an enterocyte marker) as a function of distance from archetype in equal mass bins of cells (Methods: 1D Gene enrichment at archetypes), line color indicates archetype. This gene is maximally enriched only at the enterocyte archetype (blue line). For enrichment plots for additional genes see S5 Fig. (c) A two dimensional enrichment plot of SLC26A3, in which its expression is plotted on the plane of the first 2PCs of the data, indicating expression is maximal in the cells closest to the enterocyte archetype. Contours are expression density estimated using a Gaussian kernel (Methods: 2D Gene enrichment at archetypes). Archetype positions and PCs were calculated without the tested gene.

More »

Fig 3 Expand

Fig 4.

Progenitor crypt cells fall in a tetrahedron.

(a) Enterocytes, goblet cells and nodal cells analyzed separately do not form significant polytopes. Cells are color coded by type in the tetrahedron of Fig 3, and each cell class is plotted in its own first 3PC. (b) Progenitor cells analyzed separately fall uniformly in a tetrahedron. The best fit tetrahedron is shown (PCHA delta = 0.5). Arrow represents direction of development according to Axin2 levels, see S7b Fig. Also shown are projections on the principal planes, which resemble triangles or quadrangles. Archetypes and their variation upon data resampling (bootstrapping) are shown as gray ellipses. (c) Explained variance as a function of polytope order k or number of PCs D both suggest a tetrahedron (k = 4, D = 3).

More »

Fig 4 Expand

Fig 5.

Mouse and human colon lower crypt cells fall on similar triangles and show a similar distribution within the triangle.

(a) Mouse colon lower crypt cells dataset by [37], plotted on its first 2PCs plane. Inset: PCA explained variance analysis suggest k = 3 vertices and D = 2 dimensions, namely a triangle. (b) Human lower crypt dataset plotted on its first 2PC plane. Inset: explained PCA explained variance analysis. The 24 genes common to the two datasets were used. Arrow indicates projection of the nodal cell archetype on the triangle.

More »

Fig 5 Expand

Fig 6.

Different tissues analyzed by different single-cell technologies show polytopes and tasks.

(a) Human bone marrow cells analyzed by single-cell mass cytometry [13,25] in which proteins are detected using mass-tagged antibodies is well described by a 4D simplex (a polytope with 5 vertices). The simplex is shown projected on the first 3PCs, for other projections see S12c Fig. The archetypes correspond to cell types as indicated. Cell density peaks near each archetype. (b) Mouse spleen LPS stimulated dendritic cells analyzed by single-cell RNA-Seq [3] are well described by a tetrahedron. Archetypes are labeled with functions inferred from genes maximally enriched in cells near each archetype. For more details see S1G and S1H Text.

More »

Fig 6 Expand