Transcriptome analysis reveals high tumor heterogeneity with respect to re-activation of stemness and proliferation programs

Significant alterations in signaling pathways and transcriptional regulatory programs together represent major hallmarks of many cancers. These, among all, include the reactivation of stemness, which is registered by the expression of pathways that are active in the embryonic stem cells (ESCs). Here, we assembled gene sets that reflect the stemness and proliferation signatures and used them to analyze a large panel of RNA-seq data from The Cancer Genome Atlas (TCGA) Consortium in order to specifically assess the expression of stemness-related and proliferation-related genes across a collection of different tumor types. We introduced a metric that captures the collective similarity of the expression profile of a tumor to that of ESCs, which showed that stemness and proliferation signatures vary greatly between different tumor types. We also observed a high degree of intertumoral heterogeneity in the expression of stemness- and proliferation-related genes, which was associated with increased hazard ratios in a fraction of tumors and mirrored by high intratumoral heterogeneity and a remarkable stemness capacity in metastatic lesions across cancer cells in single cell RNA-seq datasets. Taken together, these results indicate that the expression of stemness signatures is highly heterogeneous and cannot be used as a universal determinant of cancer. This calls into question the universal validity of diagnostic tests that are based on stem cell markers.


Introduction
Cancer is one of the major causes of death worldwide [1]. A significant progress in cancer treatment has been achieved with the development of a large therapeutic arsenal including chemotherapy, surgery, radiation therapy, and immunotherapy [2][3][4]. An important direction in clinical research focuses on the so-called Cancer Stem Cells (CSC), a subpopulation of tumorous cells with high tumor-initiating potential [5]. CSCs are increasingly regarded as prominent targets for anti-cancer therapy, although the degree of expression of the stem celllike phenotype, referred to as stemness, may vary between different tumors [6]. Several hypotheses relate stemness with the origin of cancer. It is acknowledged that cancers arise either from a malignant transformation of a progenitor cell, or from a non-stem cell, which reacquired the stemness potential [7][8][9]. This paradigm is sustained by significant convergence of stem cells (SC) and CSCs in the activated signaling cascades, moreover in their overlapping expression of a set of biomarkers encompassing the classical self-renewal-associated pathways Wnt/β-catenin, Bmi-1, sonic hedgehog, Notch, and PTEN [10]. Additionally, both SCs and CSCs express tissue-specific stem cell markers [11][12][13][14]. Such concordant molecular profile stipulates key aspects of SC and CSC phenotype including longevity, dormancy, niche dependence, and the potential for asymmetric cell division [15][16][17][18].
Tumors display frequent inter-and intratumor heterogeneity in alterations of the global gene expression programs and, in particular, in the expression of stemness markers [19][20][21][22]. Commonly used CSC-associated markers have a variable expression in glioblastoma [23]. The expression of breast cancer stem cell markers, ALDH1A1 and CD133, shows spatial heterogeneity across patients [19]. Stemness phenotype is also heterogeneous among many normal adult SC populations in the human body, where the SCs uphold tissue regenerative capacity [24,25]. Moreover, the degree of activation of stemness programs is associated with increased expression of multiple immunosuppressive pathways and decreased anticancer immunity [26]. This suggests that the stemness state may itself be highly heterogeneous within and between tumor types, which may play an important role in cancer pathogenesis.
A number of metrics have been developed to quantify stemness [20,26,27]. These metrics correlate with intratumoral heterogeneity, antitumor immune response, and clinical prognosis [20,26,27]. However, the heterogeneity of the stemness signature has not been assessed in detail [19]. Here, we reanalyzed the transcriptomes of 19 tumor types from The Cancer Genome Atlas (TCGA) Consortium and juxtaposed them with the transcriptomes of human embryonic stem cells [28], adult stem cells [29,30], and induced pluripotent stem cells (iPSC) within 4 days of differentiation [31]. The comparison of these transcriptomes from the perspective of reactivation of the stemness program revealed a pronounced heterogeneity both within and between tumor types, which in many cases was associated with tumor-specific patient survival. To interrogate intratumoral heterogeneity, we additionally demonstrated an increased variability of stemness signature in cancer cells compared to non-cancer cells and in metastatic outgrowth compared to primary tumors, and convergence of cancer cells to stem cells using single cell transcriptomes.

RNA-seq data and related clinical data
Gene expression values for tumors and normal tissues from the TCGA dataset and all relevant metadata were downloaded in the form of read counts using R package TCGAbiolinks [32] (S1 File). The fastq files for stem cell (SC) datasets comprising iPSC (fibroblasts purified from skin-punch biopsies from six males and six females that were reprogrammed using transfection with an episomal plasmids containing OCT3/4, SOX2, KLF4, L-MYC, LIN28, and an shRNA against p53 [33]), ESC, other types of pluripotent stem cells (PSC), and adult stem cells (ASC) were either downloaded from Sequence Read Archive (SRA) or obtained by direct download from arrayexpress ftp server (S1 Table). All fastq files were processed by a uniform pipeline, in which the reads were first checked for quality, trimmed using TrimGalore, and mapped with STAR version 2.7.1a [34] to the December 2013 assembly of the human genome (hg38, GRCh38). Gene expression levels were quantified using the Subread tool [35] implemented in Rsubread package version 1.34.7 [36]. The read count matrices from TCGA and stem cell datasets were merged, filtered by gene expression, and normalized using edgeR [37]. The resulting combined matrix was corrected for tumor purity as explained below.
The raw UMI counts for the scRNA-seq dataset (63,689 cells from 23 primary colorectal cancer and 10 matched normal mucosa samples) were downloaded from GEO under the accession GSE132465 [38]. The raw UMI counts for mESC and iPSC scRNA-seq datasets were downloaded from GEO and ArrayExpress under the accession numbers GSE135509 and E-MTAB-6687, respectively [39]. All counts were normalized for sequencing depth per cell and transformed to log 2 -scale using a pseudocount of one. The subsequent analysis and visualization of single cell RNA-seq data, including integration using a reciprocal PCA workflow, were done using R package Seurat version 4.0.2. Only sample 4 of iPSC dataset was used as it possessed the highest average stemness signature score. The lung adenocarcinoma scRNAseq dataset with metastatic samples was accessed and processed following the guidelines provided by the authors of the original publication [40].

Correction for tumor purity
s is the TMM-normalized log 2 (CPM) of the gene i in the sample s, and P s is the value of the CPE of the sample s, and ε i,s is the error term. The purity-corrected TMM-normalized log 2 (CPM) of gene i in the sample s were obtained from Y i,s by subtracting the linear part, i.e., Y cor i;s ¼ Y i;s À b i;1 � P s . Since the CPE metric provides an estimate for the proportion of tumor cells, which are presumably absent in normal tissues, we assigned a random value close to zero from normal distribution with the mean 0.08 (0.05 percentile of CPE distribution) and standard deviation 0.03 (� 1 3 of the 0.05 percentile of CPE distribution) to CPE of all normal tissues and iPSCs. The resulting values of Y cor i;s showed no statistically significant dependence on CPE (mean R 2 ' 0).

Differential gene expression analysis
The edgeR package was used for differential expression analysis. Raw read counts were normalized by edgeR using the TMM (Trimmed Mean of M-values) method [37]. The model was fitted to the data using glmQLFit function. Differential gene expression was estimated with glmTreat function. For GO enrichment analysis, differentially expressed genes were defined by |log 2 (FC)| � 1 and adjusted p-value cutoff of 0.05.

Gene ontology analysis
Gene Ontology analysis of differentially expressed genes was done using R package clus-terProfiler [42] with the default parameters.

Signature gene sets
The set of stemness gene signatures (n = 271) was originally designed by DPa by a combination of literature search and analysis of transcriptomic datasets. However, his efforts were interrupted by force majeure circumstances, and the rest of the team had to reassemble a similar set using another procedure outlined below.
We considered datasets spanning various experimental protocols, e.g. tissue-gene expression atlases, ESC differentiation time series, and knockdowns of pluripotency-associated transcription factors (ppTF-KDs) [43][44][45][46]. The datasets were analyzed one at a time, with individually selected cutoffs (S2 Table). The microarray gene expression data were downloaded using the R package GEOquery [47], normalized using the rma function available in the R package affy [48], log 2 -transformed and processed following the guidelines from the affy package vignette.
Our collection included two gene expression atlases, GSE1133 and GSE10246, both containing gene expression data from various mouse tissues and cell lines. We identified a group of samples, in which the stemness program could be active (S2 Table), and selected only one group in each atlas that was closest to ESCs (Blastocysts in GSE1133 and mouse ESCs in GSE10246). In each atlas, we computed log 2 FC between the selected group and the rest of tissues or cell lines and computed E, the log-sum of gene expression values across samples. Then, we selected genes that satisfied the following conditions in at least n comp. of the comparisons in each atlas: log 2 FC > 0.05, E > 3, n comp. = 34 for GSE1133, and log 2 FC > 0.1, E > 4.5, n comp. = 49 for GSE10246. This resulted in two lists of putative stemness genes with 5764 and 2670 elements for GSE1133 and GSE10246, respectively.
Next, we analyzed a collection of ESC-line differentiation datasets, which included time series analysis of 14 days of differentiation in three mouse ESC-lines: J1, R1 and v6.5. As before, we computed E and log 2 FC between the first (0 hours) and last (14 days) time points, separately for each cell line. The cutoffs log 2 FC > 0.05 and E > 3 resulted in three lists of 3286, 3567, and 2609 putative stemness genes corresponding to J1, R1 and v6.5 lines, respectively. A similar analysis of shRNA knockdowns of five core pluripotency transcription factors, POU5F1, NANOG, SOX2, ESRRB and SALL4, in mouse ESCs resulted in 8588 genes with log 2 FC > 0.05 and E > 3 in at least one of the experiments.
The intersection of these lists consisted of n = 454 genes, to which we added a set of manually collected and curated putative stemness genes, which were not picked by our analysis (n = 19) (S2 File). Then, we removed tissue-specific genes [49] (n = 7), proliferation signature genes (n = 49, see below), and genes that were functionally associated with the cell cycle or proliferation according to either KEGG or MsigDB databases (n = 28). This resulted in a list of 389 unique murine genes, which were mapped to their human orthologs using R package BiomaRt. The details of this procedure including cutoffs and group comparisons are also summarized in S2 Table. To define the proliferation signatures, we used the list of genes that are functionally associated with proliferation as provided by Ben-Porath et al (n = 326) [50]. To infer activity of epithelial-to-mesenchymal transition (EMT) and mesenchymal-to-epithelial transition (MET), we selected TFs associated with either of the two processes, SNAI1 and SNAI2 for EMT, and GRHL1-3 for MET. Then, we combined the selected TFs with their protein interactors and coexpressed genes according to STRING-DB (v.11.0) [51] resulting in a list of n = 51 genes. The resulting gene lists are summarized in S3 File.

Calculation of signature intensity metric
To compare the degree of reactivation of the stemness program between different tumors, we introduced a metric called signature intensity, which represents the percentage of samples that belong to the Stem cluster among all samples of the given tumor. Recall that a sample was assigned to the Stem cluster (respectively, Normal cluster) if it was located closer to (respectively, further from) ESC/iPSCs according to PC1 than the global median of the PC1 axis. That is, the signature intensity I j of the tumor j was computed as

Survival analysis
To test whether the reactivation of the stemness program correlates with poorer prognosis, we stratified tumor samples into two clusters relative to the median PC1 value of a given tumor type. For a given tumor we subdivided samples into Stem and Normal clusters based on the proximity to iPSCs on PC1. However, instead of using the global median PC1 over all tumors as a threshold, we used the median PC1 of each tumor type. This approach resulted in tumorspecific clusters of balanced size, which were used for survival analysis. We fitted Cox-regression using Stem and Normal clusters as predictors for each tumor when clinical data were available. Hazard ratios (HR) between two clusters were then computed for all tumors using R package survminer.

Statistical analysis
All statistical analyses were done using R software version 3.6.0. Confidence intervals for proportions were computed using a 2-sample z-test without continuity correction. All tests were carried out at the 5% significance level with Benjamini-Hochberg correction for multiple testing.

Stemness genes
Several lists of stemness marker genes currently exist, however there is no universal such list [52]. For instance, a recent meta-analysis of stemness genes from several independent studies revealed that only one gene was common among lists derived from three SC populations [53]. Complementary to this, here we assembled a list of pluripotency markers using the data from a range of different experimental approaches including tissue atlases, ESC differentiation timeseries, and knockdowns of pluripotency-associated transcriptional factors [43][44][45][46]. To distinguish between stemness and other traits, we removed proliferation-related genes and tissuespecific genes [49] (see Methods for details) and obtained a list of (n = 389) stemness genes including master regulators of stemness POU5F1, SALL4, NANOG as well as many other genes involved in the transcriptional regulation and cellular metabolism (S3 File) [54]. The comparison of this set with the ESC signatures provided in other studies [50, 55, 56] revealed a moderate intersection (Fig 1), however not as large as the intersection of ESC signatures with each other. For instance, the gene set of

Tumors span continuously from normal tissues to iPSC
To investigate how tumors, normal tissues, ESC, ASC, and iPSCs compare in terms of stemness signature, we used principal component analysis (PCA) of the adjusted gene expression profiles restricted to the set of stemness genes (Fig 2A). Tumor samples distributed over the first principal component (PC1, 24% of the total variance) forming two distinct clusters located between normal tissues and ESCs/iPSCs, while ASCs were located close to the normal tissues on the PC1 axis. At that, the iPSCs differentiation time series also clustered along PC1 so that less differentiated cells were located further from the center. Remarkably, the second principal component (PC2, 8% of the total variance) separated iPSCs and ESCs. The higherorder principal components did not show any clear separation related to stemness genes (S3 The expression of particular stemness markers followed the pattern of clustering along PC1 (S5 Fig). SALL4, one of the major regulators of pluripotency, was among the genes with the highest loadings in PC1, with a gradual increase in expression along the PC1 axis reaching its maximum at iPSCs. On the other hand, SALL1, which is suppressed in breast cancer [61], was ubiquitously downregulated in tumors, while being steadily expressed in both stem cells and normal tissues. The expression of POU5F1, a key regulator of stem cell pluripotency, substantially increased, while the expression of CBX7, a gene that encodes a Polycomb protein which globally regulates cellular lifespan [62] and is a tumor suppressor in both mice and humans [63], decreased towards ESC/iPSCs (S5 Fig).
To check whether the observed clustering along PC1 axis was consistent with patterns reported for particular tumor types, we analyzed the stemness signature in two related, yet different cancers, lung squamous cell carcinoma (LUSC) [64] and lung adenocarcinoma (LUAD) [65]. LUSC has a higher mortality rate compared to LUAD and shows a pronounced upregulation of sonic-hedgehog, a major regulator of the developmental pathway linked to stemness and proliferation, which is often active in adult stem cells and is absent in LUAD [66,67]. Consistent with this, LUSC was located closer to SC along the PC1 axis, while LUAD was closer to normal tissues (S6A

Analysis of proliferation and EMT signatures
The expression of stemness genes revealed clustering of tumor samples in between normal tissues and ESC/iPSCs, which may not be a unique property of stemness genes. However, no clustering was observed when we performed PCA on a random set of genes that were matched by expression levels to stemness genes (S8 Fig). In contrast, when we repeated the same analysis using proliferation markers, tumor samples again scattered across the PC1 axis (55% of variance) forming two separate clusters (Fig 2B), and the differentiation states of iPSCs separated concordantly along this axis. As before, PC2 separated ESCs from iPSCs, while no clear separation was observed in higher order principal components suggesting that PC1 represents the proliferation signature (S3B Fig). In contrast, the clustering of tumors, normal tissues, and iPSCs with respect to the epithelial to mesenchymal transition (EMT) signature was drastically different (Fig 2C). Except for the marginal standing of ASC, in which EMT must be active, we did not observe any clear separation of tumor samples from normal tissues, nor did we detect a directional trend of tumor samples along any axes. There was no clear separation in higher-order principal components, and most of the tumors showed epithelial phenotype similar to their tissues of origin, except for hepatocellular carcinoma, melanoma and brain tumors (S3C Fig). Instead of the gradient of EMT signature, we observed a switch-like state of master regulators of EMT (S9 Fig). Altogether, this analysis failed to capture EMT signature in tumor tissue, possibly because cells undergoing EMT are located at metastatic outgrowth or even dispersed as a transient circulating population thus preventing their detection in bulk sequencing [68].

Heterogeneity of stemness and proliferation signatures across tumor types
The analysis of stemness and proliferation signatures revealed two clusters of tumor samples, one of which was located closer to iPSCs along PC1, and the other was located closer to normal tissues (Fig 2A and 2B). Since this separation occurred along the PC1 axis in both signatures, we formally defined the clusters by their position relative to the median of the PC1 across all tumor samples (Fig 3, black vertical line). Namely, samples located to the right of the median of PC1 axis towards normal tissues were assigned to the Normal cluster, while samples located to the left of the median towards iPSC were assigned to Stem cluster.
Next, we investigated whether any tumor type was specifically enriched in one of the two clusters (Fig 3A-3C). Indeed, some tumors were fully localized to one of the clusters, e.g. Rectal Adenocarcinoma and all subtypes of Renal Cell Carcinoma (KIRP, KIRC, KICH), while others showed a significant heterogeneity in terms of both stemness and proliferation signatures, e.g. Lung Adenocarcinoma and Liver Hepatocellular Carcinoma. To quantify the heterogeneity of stemness and proliferation signatures within each tumor type, we computed a metric called intensity, which is defined as the proportion of samples of a given tumor type that are located in the Stem cluster (see Methods for details). In other words, this metric captures the degree, to which each tumor type collectively expresses stemness or proliferation signatures.
Despite stemness and proliferation intensities being strongly correlated, they differ significantly for some tumor types (Fig 4A and 4B). For instance, the stemness intensity of glioblastoma (GBM) is 28%±7%, while its proliferation intensity is 50%±8%. Conversely, the stemness intensity of hepatocellular carcinoma (LIHC) is 46%±5%, while its proliferation intensity is 31%±5%. Thus, these two metrics capture consistent, yet different aspects of tumor gene expression. Most tumors with high purity have low stemness and proliferation intensities, consistent with lower aggressiveness of such tumors [41], while tumors of higher stage tend to have higher intensity of both signatures. Remarkably, tumors that arise from tissues with high regenerative capacity have higher intensity of both signatures, presumably reflecting the innate stem cell populations that maintain homeostasis.

Intertumor variability correlates with survival
The intensity of stemness and proliferation signatures separates most tumors into two groups corresponding to Normal and Stem clusters by the median of the PC1 axis. However, this subdivision reflects global heterogeneity of stemness among all analyzed tumor types. In order to assess the intertumor heterogeneity, i.e., the heterogeneity between different tumors of the same tumor type, we used the PC1 median of a given tumor type rather than the global PC1 median across all tumor samples to cluster samples into Stem and Normal groups specific to each tumor type (Fig 5A). This approach provided balanced groups, which we used for survival analysis.
We observed a significant difference in survival between the Stem and Normal clusters only in a fraction of tumors ( Fig 5B). Remarkably, the hazard ratio was significant for tumors that did not show the highest stemness and proliferation intensity (the bottom left quadrant in Fig  4), while tumors with the highest stemness and proliferation intensity (the upper right quadrant in Fig 4) did not show a significant difference in survival. For instance, we did not observe a significant difference in hazard ratio for LUSC despite high stemness and proliferation intensity, while skin cutaneous melanoma (SKCM), which had the stemness intensity on the level of LUSC, showed a significant difference in the survival between Stem and Normal clusters ( Fig  6). A similar heterogeneity was observed for tumors with low stemness intensity such as kidney renal carcinoma (KIRC) and low grade glioma (LGG).

Intratumor heterogeneity
Unlike intertumor variability, which reflects differences between different tumors of the same type from different patients, intratumor variability refers to genotypic and phenotypic differences between clonal populations of cells within a tumor. The quantitative measurement of the transcriptional diversity of cells within a tumor are provided by transcriptomic profiling at a single-cell resolution [69]. To assess the intratumor heterogeneity of stemness signature, we re-analyzed the transcriptomes of 65,362 unsorted single cells from metastatic colorectal

PLOS ONE
Stemness heterogeneity across tumors cancers [38]. We used the loadings, i.e., the coefficients of the linear combination of stemness signature genes, that correspond to the PC1 axis in Fig 2A to project the transcriptional profiles of single cancer and non-cancer cells onto PC1 (Methods). The 2D representation using uniform manifold approximation and projection (UMAP) [70] revealed a clear separation of cancer and non-cancer cells consistent with the annotation from [38] (Fig 7). Remarkably, the cancer cell cluster shows a considerable increase of the stemness signature encoded within PC1 (Fig 7A). However, not only the absolute value of the stemness signature, but also its variability was higher in the cancer cell cluster compared to non-cancer cell cluster indicating that the heterogeneity of stemness signature is observed at all levels of cancer organization, including single-cell level.
To compare the stemness capacity among individual tumor cells and stem cells, we integrated two additional scRNA-seq datasets of mESC [71] and human iPSC [72] cultures along with scRNA-seq dataset of colorectal cancer. We computed the PC1 projection for each cell using PCA loadings from Fig 2 as a quantitative proxy of stemness, proliferation and EMT signatures. As expected, we observed a gradient of stemness signatures from normal to cancer cells and then to stem cells, which encompass both adult stem cells residing in the niches of the normal tissues (labelled ASC) and iPSCs with ESCs ( Fig 8A). The proliferation signatures mirrored the stemness signatures with a notable exception of iPSCs, which had stemness signatures below ASC while being proliferative more active. The EMT signatures were distributed  bimodally with cells of positive scores having more mesenchymal expression characteristics. Remarkably, while the majority of cancer cells had negative scores, presumably reflecting the epithelial origin of colorectal cancer, a large fraction (25%) likely containing CSCs still had mesenchymal signature. The reduction to two dimensions with UMAP revealed two clusters with high overall stemness signatures, one that corresponded to ESC and another to iPSCs, which both converged into a more differentiated state represented by a cluster that was shared between ESCs and iPSCs ( Fig 8B).
It was reported that cancer cells comprising metastatic outgrowths possess prominent stem-like characteristics [73]. To investigate this trait of cancer cell stemness, we computed PC1 projections for each cell in a scRNA-seq dataset that contains normal tissues, primary tumors and metastatic lesions of lung adenocarcinoma [40]. Indeed, single cells from these three categories showed significantly increased stemness signatures in metastatic lesions in comparison to both primary tumors and normal tissues ( Fig 8C).
Next, we questioned how the heterogeneity of stemness signature compares between individual cells and between tumors that arise in different patients. To address this, we reanalyzed stemness signatures of individual cells in the colorectal cancer cohort taking the patient information into account and found that a substantial proportion of variance of stemness signature (13.5%, 1-way ANOVA, P-value < 10 −16 ) is attributed to intertumoral, as opposed to intratumoral heterogeneity. Remarkably, cancer cells in some patients (SMC10, SMC08) were bimodally distributed along the stemness signature score, potentially suggesting a presence of cancer cell subpopulations of different potency (Fig 8D).

Discussion
Transcriptome comparisons have been a powerful tool for studying gene expression signatures in disease [74][75][76]. We used principal component analysis (PCA), which represents each sample as a point in an n-dimensional space with coordinates corresponding to gene expression values and applies a dimensionality reduction to identify principal components with the largest variance. The outcome of these transformations, however, critically depends on the set of genes that were chosen initially. For instance, the use of different gene sets led to opposite outcomes with tissue-dominated and species-dominated clustering [77]. In this work, we identified a gene set corresponding to the stemness signature encoded in PC1, which followed the stemness gene expression gradient from the normal tissues through tumors and ASC to ESC

PLOS ONE
and iPSCs. This gene set provides a better representation of the stemness axis compared to previously published gene sets [50, 55,56].
The TCGA dataset showed a consistent reactivation of stemness in all solid tumors in comparison to normal tissues, in accordance with other reports [78][79][80]. At the same time, the degree of reactivation of the stemness and proliferation signatures varied greatly between different tumor types and also inter-and intratumorally, thus extending the results of previous studies on stemness heterogeneity to the TCGA dataset [27,81]. Remarkably, tumors originating from the tissues that actively interact with the outside environment (lungs, urinary system, skin, intestines) show strong reactivation of both stemness and proliferation programs consistently with the hypothesis that tumors inherit their self-renewal capacity from the tissues of origin. The presence and abundance of innate stem cell populations is an important factor that contributes to the heterogeneity of stemness signatures between different tumor types.
An important aspect of the reactivation of gene expression programs in tumors is formulated by the Lineage Addiction Model, which suggests that the mechanisms that promote tumor progression involve master regulatory genes that also exert key survival roles in development [82]. Multiple examples of cancer lineage addiction have been reported [83][84][85][86], however the effect of lineage addiction on the reactivation of the stemness program remains to be studied in detail. In our analysis, only a few tumor types (Breast Cancer, Lung Adenocarcinoma, Lung Squamous Cell Carcinoma) spanned the entire PC1 axis, while most tumor types were fully localized to either Stem or Normal cluster. This observation supports the idea that the majority of tumors are restricted to specific phenotypic spaces with respect to the reactivation of the stemness program.
In the study of Ben-Porath et al, high expression of stemness genes was shown to be predictive of poor survival among the patients of three breast cancer cohorts [50]. Other studies also reported reduced survival for the tumors of stem phenotype [87][88][89][90]. Overall, as one of the hallmarks of cancer [91], stemness has been suggested before as a universal predictor of patient survival [27]. In our analysis, however, only seven out of sixteen tumors, for which patient survival data were available, showed a significant negative correlation of stemness with survival. These results highlight the absence of universal ties between patient survival and the degree of stemness signature reactivation, indicating a substantial degree of heterogeneity in tumor reaction to the latter [26]. This analysis suggests that in spite of associations with tumor stage and aggressiveness, the stemness is not a universal predictor of survival and could be regarded instead as a function of the molecular profile that is specific for every tumor type.
According to the cancer stem cell hypothesis, cancers arise from transformation of stem or progenitor cells that are capable of multilineage differentiation. The analysis of CSCs in scRNA-seq data cannot be carried out directly since their identities are not known. However, heterogeneous stemness, proliferation, and EMT signatures may partially serve as indicators of CSCs in a large pool of cells. We observed a larger heterogeneity of stemness signature both in cancer cells and stem cells grown in culture, not including ASC, presumably due to inherent transcriptomic instability of cancer cells that can potentially reach extremes in both differentiation and dedifferentiation. The heterogeneity of stemness signature in iPSCs and ESCs seems to arise from stochastic differentiation processes that are known to occur in cell culture [92,93]. In this regard, transcriptomic instability of cancer cells that occasionally leads to stochastic dedifferentiation could serve as a source of CSCs.
Interestingly, endothelial cells, fibroblasts and stromal cells all showed mesenchymal scores, while iPSCs and ESCs yet again showed high variability covering both mesenchymal and epithelial scores. This stands in line with previous observations that pluripotent stem cells in cell culture persist in a dynamic balance between epithelial and mesenchymal identities [94]. However, most cancer cells are enriched in epithelial state, whereas a fraction of cells (likely CSCs) still possess high mesenchymal scores. This can be explained by the fact that the bulk of cells comprising the tumor maintain epithelial identity with only a few cells undergoing EMT. It is also known that cancer cells undergoing EMT are involved in metastasis of a tumor, in full agreement with our results.
The major limitation of the present study arises from the use of bulk RNA-seq, which is unable to capture cellular heterogeneity. Single-cell RNA-seq experiments provide an orthogonal view to the bulk RNA-seq by measuring the transcriptional profiles of individual cells, however at the expense of sparse coverage. Here, we used colorectal cancer to demonstrate as a proof of principle that the expression of stemness signature is highly heterogeneous not only intertumorally, but also at the level of individual cancer cells, with a larger contribution from intratumoral heterogeneity. Additionally, it was reported that specific patterns of hyper/hypo methylation follow the stem phenotype in cancer cells [95,96]. Therefore, another source of stemness heterogeneity may come from the epigenetic component or from other factors such as variability in pre-mRNA splicing [97] and expression of non-coding genes, e.g. transposons [98]. A combined analysis of single-cell transcriptomes, pre-mRNA splicing by bulk RNA-seq, epigenetic assays based on ChIP-seq, and non-coding RNA quantification is needed for the detailed characterization of the stemness heterogeneity landscape. Growing amounts of this information in the public domain enable such characterization from the multi-omics perspective and open new directions for future studies.

Conclusion
Tissue-independent reactivation of the stemness program represents a unifying feature which ties together tumors of different origins. Nonetheless, the degree to which this program becomes reactivated strongly fluctuates between different tumor types and also inter-and intratumorally, thus suggesting that the effect of the stemness program on the tumor phenotype is highly heterogeneous. Multiple studies, including those based on single cell technology, have described tumor heterogeneity arising at different levels through various mechanisms. Here, we pinpoint yet another and previously unappreciated aspect of heterogeneity, one that is related to the reactivation of the stemness program, thus adding to an already complex picture of tumorigenesis and potentially impacting the diagnostic protocols and the development of new anticancer treatments.  Table. GEO accession numbers of the datasets that were used to assemble a set of stemness signature genes. Additional information is reported in Methods (see Signature gene sets section for details). log 2 FC denotes log-fold-change threshold (Treatment vs Control); log 2 E denotes log gene expression level threshold (Treatment + Control); Comparisons denotes the list of comparisons for differential gene expression analysis.