Predicting lung aging using scRNA-Seq data
Fig 2
Cell type mapping, dataset integration, comparison of gene types and cell types.
A. Joint cell embeddings of query datasets (IPF, Carraro, Nuclear-Seq) and reference dataset (HLCA) after cell type transfer performed by scArches. Plots were generated by the 30 dimensions of latent representations transformed from the original gene space. B. Joint cell embeddings of query datasets (IPF, Carraro) and reference dataset (HLCA) after dataset integration performed by mnnpy. Plots were generated based on the intersection of HVGs between the reference and the query datasets. C. The mean R2 scores for the comparison of all tested methods. R2 scores shown in the plot are from the top 10 cell types with highest R2 scores for each method. P-value annotation legend: ns (not significant): p-value ≥ 0.05; *: 0.01 < p-value ≤ 0.05; **: 0.001 < p-value ≤ 0.01; ***: 0.0001 < p-value ≤ 0.001; ****: p-value ≤ 0.00001. D. The rankings of different types of gene markers for the top cell types. The rankings were computed for each cell type separately. For each cell type, we selected for gene type’s best PCA setting as determined by highest R2 score. We used this R2 score as the representative R2 score of that gene type for the given cell type. We then ranked the different gene types by these representative R2 scores. The resulted rankings (from 1 to 6) for the top 10 cell types are presented in the plots. These cell types are extracted from the top 10 cell types as shown in Figs 3 and. 4. E,F. Predicted donor ages VS true donor age for the top cell types. For each cell type, we selected its best gene type and PCA setting as determined by highest R2 score. The corresponding best gene type and PCA setting is labeled in each plot.