Quantifying the clusterness and trajectoriness of single-cell RNA-seq data

doi:10.1371/journal.pcbi.1011866

Fig 1.

Visualizations of datasets where clustering and trajectory inference produce similar interpretations.

(a, b) Two datasets with clear trajectory-like visualizations from both clustering and trajectory inference. (c, d) Two datasets with clear clustering-like visualizations from both clustering and trajectory inference.

More »

Expand

Fig 2.

Visualizations of datasets where clustering and trajectory inference produce different interpretations.

When applied to these four datasets, clustering analysis showed distinctive clusters while trajectory inference showed uninterrupted continuum, as shown in the two panels in all four cases.

More »

Expand

Fig 3.

Overview of the proposed pipeline and simulated data used.

(a) Given a dataset, five different scoring metrics are used to quantify the dataset. The output is five numerical scores. (b) Scatter plots visualizing a few examples of simulated datasets. The simulated datasets are two-dimensional. (c) A multitude of simulated datasets was scored by the scoring metrics, and the scores are projected to UMAP space.

More »

Expand

Fig 4.

Proposed scores show meaningful differences between cluster-like and trajectory-like datasets.

For each scoring metric, a violin plot shows the scores across simulated clear clusters data (n = 3000), simulated clear trajectory data (n = 3000), simulated noisy clusters data (n = 3000), and simulated noisy trajectory data (n = 3000). Blue solid lines represent the median of the distributions. All score metrics exhibit meaningful differences across the four simulated types of data.

More »

Expand

Fig 5.

Simulated geometric landscape of clusterness and trajectoriness.

(a) Each dot in the UMAP represents one simulated dataset, and is visualized by the scatter plot of the simulated dataset itself. (b) Dots in the UMAP plot were colored by the proportion of neighboring dots belonging clear or noisy trajectory-like data. The boundary on the colored UMAP showed the separation between cluster-like and trajectory-like datasets. (c) Colored visualizations by values of each of the five scoring metrics, showing variations of the scores in the UMAP space. Red represents higher values, and blue represents lower values.

More »

Expand

Fig 6.

Projections of scRNA-seq datasets onto the simulated geometric landscape.

(a) Each red triangle represents the projection of one of the 169 real scRNA-seq datasets projected to the simulated geometric landscape. (b) Violin plots of the scoring metrics in the simulated data versus the real scRNA-seq data, showing that the distribution of scores were similar between simulated and real datasets. (c) A tabular summary of the presumed geometric intuition and predicted geometric property for the 169 datasets, showing roughly 70% agreement.

More »

Expand

Fig 7.

Projection of individual clusters onto the simulated geometric landscape.

(a) After clustering analysis of the 169 scRNA-seq datasets, each resulting cluster is treated as a separate dataset and projected onto the simulated geometric landscape. The resulting projections, shown as the red dots were enriched toward the boundary between cluster-like and trajectory-like regions of the geometric landscape. (b) Projections of one cluster-like dataset and its individual clusters, shown by the red triangle and red dots respectively. (c) Projections of one trajectory-like dataset and its individual clusters, shown by the red triangle and red dots respectively.

More »

Expand

Fig 8.

Projections of example scRNA-seq datasets to the simulated geometric landscape.

(a) Projections of the four datasets visualized in Fig 1 to the simulated geometric landscape. The two datasets with clear trajectory-like visualizations in Fig 1A and 1B are both mapped to the bottom-left side of the UMAP landscape which contains predominantly simulated trajectory-like datasets. The two datasets with clear clustering-like visualizations in Fig 1C and 1D are both mapped to the mid and right side of the UMAP landscape which contains mostly simulated cluster-like datasets. (b) Projection of the four datasets visualized in Fig 2, whose geometric interpretations are different based on clustering and trajectory inference analyses. The two datasets in Fig 2A and 2B are mapped to the left side of the geometric landscape, surrounded by simulated trajectory-like datasets. The two datasets in Fig 2C and 2D are mapped to the middle region of the geometric landscape, surrounded by simulated cluster-like datasets.

More »

Expand