Transcriptomic entropy benchmarks stem cell-derived cardiomyocyte maturation against endogenous tissue at single cell level

The immaturity of pluripotent stem cell (PSC)-derived tissues has emerged as a universal problem for their biomedical applications. While efforts have been made to generate adult-like cells from PSCs, direct benchmarking of PSC-derived tissues against in vivo development has not been established. Thus, maturation status is often assessed on an ad-hoc basis. Single cell RNA-sequencing (scRNA-seq) offers a promising solution, though cross-study comparison is limited by dataset-specific batch effects. Here, we developed a novel approach to quantify PSC-derived cardiomyocyte (CM) maturation through transcriptomic entropy. Transcriptomic entropy is robust across datasets regardless of differences in isolation protocols, library preparation, and other potential batch effects. With this new model, we analyzed over 45 scRNA-seq datasets and over 52,000 CMs, and established a cross-study, cross-species CM maturation reference. This reference enabled us to directly compare PSC-CMs with the in vivo developmental trajectory and thereby to quantify PSC-CM maturation status. We further found that our entropy-based approach can be used for other cell types, including pancreatic beta cells and hepatocytes. Our study presents a biologically relevant and interpretable metric for quantifying PSC-derived tissue maturation, and is extensible to numerous tissue engineering contexts.

both murine fetal and infant growth.
As noted by the reviewer, all of these factors could lead to changes in the specific timing of the entropic trajectory over CM maturation. Even within a litter, individual mice develop at different rates. As an example, we plot entropy scores for three biological replicates animals at several timepoints within one dataset (Fig. SR1). Given the fact that different studies often used different strains, this may explain minor differences in entropy score between studies, though it would be impossible to deconvolve this variable from other differences between studies. Fig. SR1. Entropy score for multiple animals at select timepoints within one dataset. We included at least one male and one female at each timepoint, though unfortunately do not have labeled which animal was which, so we cannot conclude specific sex-based differences at this time.
However, as suggested by the reviewer, we anticipate these differences tend to be marginal compared to the large-scale changes associated with maturation. To use our example here, the spread of cell maturation within an animal and the differences between timepoints are both larger than the differences between individual animals. Likewise, we don't expect the genes that become differentially expressed to change widely, though the specific timing of their changes may be somewhat accelerated or delayed.
Third, when performing the entropy scores and pseudotime analyses, were there specific transcripts or groups of transcripts that were more informative of specific stages of maturation? You mention that ∼81.5% were identified as differentially expressed by all methods and some transcript profiles are shown in Figure 4e, but were any informative genes or gene sets (i.e., markers) more useful for assessing maturation that would not require scRNA-seq? This information (which could be added in the supplement) might make your approach more accessible to the broader research community (i.e., the identification of new and informative markers of CM development or differentiation). Alternatively, it may be that scRNA-seq is required. If so, then this should be discussed.
The focus of our manuscript was on developing a method to stage CMs across scRNA-seq studies, particularly for benchmarking PSC-CMs. However, as noted by the reviewer, developing an improved understanding of in vivo maturation is important, both on its own merits and to identify better markers for studying PSC-CM maturation. Here, we would actually direct reviewers to a more recent study of ours, which touches upon these issues (8). In that study, we extended the in vivo reference generated in this manuscript, and thoroughly classified differentially expressed genes by their temporal expression patterns. In particular, we identified how genes change across the embryonic/neonatal and perinatal time periods identified in this manuscript. We should note that while several of the results in this newer manuscript were generated using traditional dimensionality reduction approaches, similar results could be obtained using entropy score given the strong concordance between the methods.
The applications of marker-based analysis should also be considered. One effective use of a panel of marker genes would be to see, for example, if PSC-CMs improve in maturation status in a biomimetic manner. However, the advantage of entropy score is its ability to compare cross-study, both in terms of benchmarking PSC-CMs to endogenous cells and across multiple PSC-CM datasets. In our experience, we have found qPCR to produce variable results across machines and labs, analogous to RNA-seq batch effects. Thus, even with a robust panel, it would likely be difficult to similarly benchmark qPCR results across different research groups. Of course, it is possible that we may be able to define a small panel of markers, perform qPCR, and then compute entropy on those measurements. However, testing this rigorously is beyond the scope of this manuscript.
Finally, could you comment further on the application of entropy scores to study maturation and how your approach may be of value to the research community? A number of situations beyond comparisons of engineered and in vivo tissues, and somatic cell reprogramming protocols might include an evaluation of PSC-CMs for pharmaceutical and toxicity testing, and the prediction of pathways that may be essential for maturation of cells either through a gene regulatory network or through individual signaling pathways. While these experiments and discussion points are not necessary to support your conclusions, an evaluation of these points and limitations in the Discussion may broaden the paper's impact and significance.
We appreciate this point from the reviewer. One particular area where we believe entropy score may be valuable is to facilitate understanding of regulators of in vivo CM maturation. There has been recent interest in studying cell-autonomous regulators of CM maturation through mosaic knockout systems (9,10). Our group recently used this approach to study the roles of PGC1 in endogenous CM maturation (11). As an example, we show below the computed entropy scores for PGC1α/β KO cells compared with control cells (Fig. SR2). As can be seen, a maturation defect initiates around ∼ P14 and is particularly prominent by P28. More recently, others have applied forward genetic screens using mosaic systems to identify regulators of CM maturation (12). However, identifying an optimal readout for such screens is challenging. We anticipate that entropy score may be useful in these contexts, particularly where extensive timepoint data may not be available. Using optimized protocols for high-throughput knockout screening followed by scRNA-seq, researchers could use entropy score to identify previously unstudied regulators of CM maturation. Of course, this same approach could be used with PSC-CMs to, for example, screen potential activators of in vitro maturation. We have included further discussion of these possibilities into our Discussion section. With regards to studying disease processes or toxicity, we direct readers to our comments below to Reviewer 2 about nonmaturation-related applications of entropy score. However, we urge some caution with interpretation. We focused our analyses on the maturation process specifically, and would need to perform more rigorous testing to study changes to entropy score in specific disease processes.

Response to Reviewer 2
It may be useful to emphasize that matching the entropy score of in vivo cardiomyocytes (or a given CM developmental state) is not a sufficient indication of matching the expression patterns of the in vivo counterpart.
This is a really important point identified by the reviewer; we have included some further discussion of this in the second paragraph of the discussion. Our recent manuscript (8) touches on this issue further, and we direct interested readers to our results there. As a quick summary, however, PSC-CMs may indeed have a comparable transcriptomic entropy or be matched to a given CM developmental state by dimensionality reduction methods while still showing important expression differences. Nevertheless, entropy score provides an approximation of the overall maturation stage of the cell.
Compare entropy scores from cardiomyocytes from snRNA-seq on post mortem tissue (Litviňuková, et al).
The Litviňuková et al. dataset (13) provides an interesting set of considerations. As discussed in the manuscript, we found that most nuclear datasets presented with overly low depth, well below our thresholds established in Fig. 3b. Moreover, sensitivity (e.g. number of genes detected per cell) was often low, below our standard of 1000 genes/cell. The Litviňuková et al. dataset is notably higher quality, with a median of 1545 genes/cell, though a depth that is still close to our low-end cutoff (3621 counts/cell). However, there is another issue that must be considered with nuclear datasets, in that they may fundamentally present with different gene distributions than whole cell datasets simply because of differences in RNA capture. There are several ways to note this. For one, we have observed that the ratio of counts:genes is higher in nuclear datasets compared to whole cell datasets (not shown here), inherently suggesting a different distribution of counts (with many more "single count" genes in nuclear datasets). Moreover, entire classes of genes show differences. As a clear example, mitochondrially-encoded transcripts will not be identified in nuclear datasets, while nuclear lncRNAs (most notably Malat1 ) will be enriched in nuclear datasets. This issue is further discussed in Selewa et al. (14). The fundamental assumption underlying using transcriptomic entropy to measure maturation status cross-dataset is that gene distributions should remain consistent across different samples at similar maturation states. However, this assumption is violated by nuclear-seq protocols. We anticipate that transcriptomic entropy could be more effectively extended to nuclear datasets once a comprehensive nuclear reference of mouse and human CM maturation (of comparable quality to the Litviňuková dataset) can be assembled, as we have done here with whole cell data. In the interim, we do not encourage direct comparison of nuclear datasets to whole cell datasets with transcriptomic entropy.
Nevertheless, some interesting analysis can be made within the dataset. We explored transcriptomic entropy across a sample of the ventricular CMs generated in the study (Fig. SR3). Interestingly, we did find a decrease in entropy score over timepoint (Fig. SR3a), despite the fact that all the CMs came from ostensibly mature hearts. However, this may be confounded with the source of the nuclei -in general, older samples came from the Sanger Institute (Fig. SR3b), and these samples systematically presented with higher percentages of MALAT1 (Fig. SR3c, potentially indicative of large sample isolation differences). This may also further indicate the need to develop unique gene filtration strategies for nuclear RNA-seq data, as we did in the manuscript for whole cell data. Notably, there were no differences in transcriptomic entropy across the various CM subpopulations identified by the authors Fig. SR2d. There are differences in cardiomyocytes obtained from different regions of the human heart (atrial vs. ventricular, left vs. right, etc.). It will be informative to compare the many in vitro differentiation datasets (and protocols) that may give result in atrial-like or ventricular-like CM to their in vivo counterparts.
The reviewer raises a particularly interesting point here -while it is well known that CMs show regional variations (even within the same cardiac chamber), the degree to which the maturation process varies across the chambers is unknown. Within our study, our generated perinatal maturation reference was isolated solely from LV CMs. However, when performing our meta-analysis of in vivo datasets, we included CMs from other anatomical locations as well, including RV and both atria. Specific details about the datasets can be found in the Appendix. Notably, despite comparing a fairly heterogeneous mix of CMs, we observed a generally consisting trend in entropy. This may suggest that despite specific molecular differences across chambers, maturation itself may be relatively well-conserved. Thus, we are optimistic that entropy score can be used to consistently study maturation in CMs regardless of anatomical origin. However, we acknowledge that this must be characterized more systematically in the future.
With regards to PSC-CMs, the question becomes more complicated. Protocols have been developed to push cells towards a more atrial-like or ventricular-like phenotype, though to date we have not observed any scRNA-seq datasets generated from these protocols. We focused our attention on PSC-CMs generated using standard protocols, which may generate a mix of phenotypes. Characterizing the chamber status of these PSC-CMs is complicated. We and others have reported that molecular markers of chamber status are often inconsistent in PSC-CMs (8,15), in particular because canonical atrial markers are also expressed in immature ventricular CMs in vivo. The gold standard for subtyping PSC-CM phenotype is patch clamping, but a) this does not always yield clear distinctions in phenotype and b) is currently incompatible with scRNA-seq. We suspect that further methods to assess CM subtype will be necessary to more comprehensively classify PSC-CMs.
This question pertains to in vitro CM differentiation: Is entropy score sensitive to cell-types that differentiate into alternative lineages during in vitro differentiation (issue of purity)? Different cell lineages may have different maturation rates and if they are not excluded, the non-cardiomyocyte cells could contribute to noisy measurements. If the entropy score is calculated after a first round of clustering, on identified CM among the population (as opposed to cardiac progenitor cells, for example), I would be more confident of the entropy score.
As noted by the reviewer, we anticipate that different cell types may have different maturation rates and thus different entropy scores. Thus, for all of our analyses involving PSC-CMs, we used the sample filtration approach as we did with our in vivo CM samples. Specifically, this including a cell type classification step using SingleCellNet. The results for PSC-CMs shown in the manuscript only include cells classified as CMs. (Please also note that specific studies may have had other filtration steps -more details for each study can be found in the Appendix). Likewise, because our analysis of PSC-CMs begins at D9 of differentiation, we anticipate that we have CMs rather than multipotent progenitors.
This also pertains to in vitro CM differentiation: Even within the cardiomyocyte lineage, there may be different rates of development that ultimately lead to the same end point. Therefore there may be the need to coarse-grain the developmental time-points to account for the precocious ones and the 'late bloomers'. It may be useful to anchor the developmental trajectory based on entropy score to biological milestones (such as when the CMs start beating in plates). Can the authors comment on this, please?
We agree with the reviewer that this question of differential rate of maturation is an interesting one. Both this manuscript as well as several others (8,11,16) have identified significant heterogeneity of maturation rates at the single cell level, despite eventual convergence to an adult state. The idea of anchoring to some other cell phenotypic readout is intriguing, though the challenge is in identifying the appropriate readout. For example, our focus is on CM maturation after specification. However, CMs begin to beat more or less immediately after specification -in vitro, PSC-CMs may begin to beat between D6-D7 of differentiation (depending on the protocol/cell line), while our analysis here already begins after that timepoint. We have also briefly looked at PSC-CM size. In our recent work (8), we observed a population of larger PSC-CMs emerging at D25 of in vitro differentiation. Surprisingly, however, we noted few transcriptional differences between larger and smaller cells, though similar results have also been seen in vivo (17). Another interesting recently identified candidate is CD36 (18). We anticipate that entropy score can be a useful tool for helping identify candidate milestones/phenotypic characteristics to stratify CMs by developmental rate.
CMs are interesting in that they are post-mitotic and as such, will attain a level or maturity at the end of the maturation process. I can imagine this not being the case for cells that continue to cycle and divide. It would be interesting to compare the change in entropy score for such cells. How about cells that differentiate when activated by an external stimulus (e.g., immune cells)? As long as a cell has high transcriptional variability or is transcriptionally active (e.g., as stress response) it may still show high entropy score. How would one interpret Entropy scores in such situations?
The focus of our analysis here was to generate a method to compare maturation states across single cell RNA-seq datasets, where maturation was implicitly defined based on the assumption of eventual quiescence (or approximate quiescence, in the case of liver/pancreas). The rationale for this focus was the practical observation that PSC-derived tissues aiming to recapitulate these largely quiescent organs have remained immature. However, as noted by the reviewer, there are many intriguing applications of transcriptomic entropy outside the scope of our analysis. Indeed, transcriptomic entropy was initially used to assess stem cell potency during differentiation processes. During our pilot studies (not shown here), we observed a dataset of early stage neuronal development in which entropy zigzagged as cells went through progressive stages of specification. We also guide readers to other studies of interest that have used similar entropic measures to study different aspects of biology. For example, Grun et al. studied entropy changes over the course of hematopoiesis (19). The same study also investigated a population of putatively dedifferentiated ductal progenitor cells in adult pancreas. The Teschendorff group has also used a signaling entropy metric to identify cancer cells among normal tissue -as the normal tissue becomes perturbed, the entropy correspondingly increases (20,21).
Given these various applications, we would suggest that entropy score be carefully interpreted to the context of the data being compared. However, as CM maturation is a notably unidirectional process with a clearly defined endpoint, entropy score should robustly allow for quantification of this process. Fig 2a. Could this be related to difficulty in dissociation, as part of stress response?

The authors note "higher mtGENE in differentiated cells and later time points." -
We have found that interpreting the mitochondrial read percentage in CM datasets is somewhat tricky. On the one hand, we note that whole heart bulk RNA-seq datasets (with no dissociation step) show high percentages of mitochondrial RNA reads compared to other tissues. This issue, for example, has been discussed by Mercer et al. (22), and more recently by Zhou (23). This also matches with expectations, as CMs are highly mitochondria rich, and increase in number and size of mitochondria is associated with CM maturation. Thus, we anticipate that the increasing mitochondrial percentage over timepoint (up to ∼45% in adult CMs) is likely biological. On the other hand, excessively high mitochondrial read percentages (as indicated in Fig.  3a) are likely due to cell damage (leading to loss of cytoplasmic but not mitochondrial RNAs) or stress response. (More details can be found at (24).) The authors note "In particular, 10x Chromium and STRT-seq datasets appeared to have systematically higher percentages of ribosomal protein-coding genes than other protocols." Could this simply be due to higher transcript capture rate of these protocols? These protocols/techniques may not be statistically sampling a cell's transcripts at the same rate as the techniques with "lower" capture efficiency.
While this is possible, it should be noted that the Smart-seq2 and mcSCRB-seq protocols are currently the most sensitive library preparation protocols. In particular, mcSCRB-seq has been optimized for high transcript capture compared to other protocols (25). However, both of these protocols show generally low ribosomal protein-coding gene rates. Anecdotally, we have heard from others about high ribosomal protein-coding genes in 10x Chromium data, and the percentages suggested by the company themselves are notably high (https://kb.10xgenomics.com/hc/en-us/articles/218169723-What-fraction-of-reads-map-to-ribo somal-proteins-). It is possible that this phenomenon has more to do with the binding properties of the specific primers used in the 10x Chromium protocol, leading to a systematic bias, but this would need to be further studied. As many studies treat ribosomal protein-coding genes as contaminants or uninformative, we proceeded similarly and filtered them from consideration.

Can entropy score be used in the context of activation (under external stimulus) or deactivation (when the external stimulus is removed)?
We hope that entropy score will be useful for this exact case, especially in the context of aiming to generate more mature PSC-CMs through application of external stimulus. To date, there have been few scRNA-seq datasets of PSC-CM perturbations, and our own dataset collection was unfortunately interrupted by the pandemic. However, as a test case, we applied entropy score to two recent perturbation datasets of PSC-CMs (Fig. SR4). The first approach, generated by Giacomelli, Meraviglia, and Campostrini et al., used co-culture of PSC-CMs with non-CMs (26). This approach yielded limited change in entropy score (Fig. SR4a). The second study, from Lam et al., used both anistropic patterning and tissue engineering (27). While this approach yielded a decrease in entropy score over control with both methods, the end CMs still failed to achieve a maturity greater than corresponding fetal human tissue or D100 PSC-CMs (Fig. SR4b).

What do the black dots represent in Fig 2c?
These were simply outliers on the box-and-whiskers plot. We apologize for the confusion -because of a graphic error the outlier points were plotted to be extremely large. We have corrected this in the updated draft.

Response to Reviewer 3
The calculation of the entropy is not clear enough (or not performed correctly). Shouldn't Pi be the GeX distribution of Gene i across all cells? The authors seem to have calculated Pi as the probability of expression in one cell then summed across. Unless I am wrong, this does not make sense and invalidates all the analysis.

Also Pi being a probability, how was the normalization performed so that the sum of the probability is 1? Given the variability in gene expression, scRNAseq platforms and number of cells it would be good to have a metric estimating the quality of the distribution.
We appreciate the chance to further clarify the rationale behind our approach. Our definition of transcriptomic entropy draws from other studies. The premise is to use Shannon entropy as a diversity index of gene expression in a single cell. An immature cell will have generally higher diversity -more genes expressed, and cell-type specific genes expressed at lower levels. Conversely, a mature cell will have lower diversity -fewer genes with higher expression in genes critical for cell function. This diversity is computed on the single cell level -e.g., what is the distribution of genes within one single cell? Thus, we compute entropy for a single cell by using the probability of each gene within the cell (which is effectively just a scaled expression value of the gene). Pi is thus the probability of selecting a count of a particular gene from the bag of all gene counts within a cell. We can then compute Pi log(Pi) for each gene and sum for the single cell (though, as discussed in the manuscript, our eventual approach subset to only the top 1000 expressed genes in each cell). What we are interested in is the distribution for the single cell, rather than the distribution of a particular gene across a population of cells. This approach led to several advantages. Firstly, entropy score can be calculated using information within just one cell. Thus, unlike dimensionality reduction methods, which rely on relationships between cells, entropy can be computed effectively on datasets with fewer cells or perhaps only one biological timepoint (and therefore possibly less variation in specific genes across that dataset). The second benefit is that entropy score does not rely on any on particular gene or set of genes. This fact was critical in enabling cross-study comparisons. We and others have generally observed that expression of specific genes is not stable across multiple studies. Thus, a method that relies on directly comparing expression levels will inevitably be complicated by batch effects -indeed, this is likely why trajectory reconstruction methods do not fully enable cross-study comparisons. By contrast, by relying on the overall gene distribution rather than a predefined set of genes, entropy score facilitates cross-study comparison.
We hope this also answers another question regarding normalization. Entropy simply normalizes genes within a cell by dividing by the total counts within the cell (effectively analogous to CPM normalization). This gives a probability that will sum to 1.
Entropy score correlated only moderately with pseudotimes for the three methods. This is a major problem that needs to be explained. One would expect entropy to give a higher correlation if it is a robust measure of development.
We thank the reviewer from bringing up this point -however, there are several important caveats to note. Firstly, we believe it is erroneous to assume that trajectory reconstruction methods are inherently a gold standard for single cell development. Trajectory reconstruction methods carry their own assumptions, namely the validity of dimensionality reduction as an approach for capturing gene expression changes over dynamic processes. This assumption does not necessarily always hold (28). Individual trajectory reconstruction methods may also carry their own specific variations and assumptions, which will also change their downstream predicted pseudotimes. Given that transcriptomic entropy is based on a completely different framework, without reference to dimensionality reduction or even intra-cell distances, we are not surprised to see some variation between the methods. This may compounded by single cell noise, where cells with very close transcriptional states may be placed in slightly different orders relative to one another across different methods.
We further note that entropy shows generally strong concordance with pseudotime methods during the range of embryonic and perinatal CM maturation before deviating more significantly from pseudotime methods once maturity is reached. We further validated this observation using a more extensive CM maturation reference from our recent study (8) (Fig. SR5). We again expect that these variations in the ordering of late-stage CMs are due to inherently different assumptions in the methods. However, given the good concordance during the critical perinatal period of CM maturation, during which the dynamic range is most biologically important, we believe that entropy is effective for quantifying maturation status. Moreover, the high concordance in identified differentially expressed genes between entropy and pseudotime methods further supports the fact that they are capturing a similar biological maturation process. One of the main purposes of the approach is to classify maturation of in vitro datasets, but basically no entropy changes are found. They are minimal in Figure 5c. Following with this, the developmental times of the datasets as shown by color codes do not match the changes in entropy (see Figs 4b, 5a/b).
As noted by the reviewer, one of our main motivations for developing entropy score was to classify PSC-CM maturation. Fig.  5a is somewhat tricky to interpret, as unlike in vivo, where there is a consistent maturation program, PSC-CM maturation is also confounded by differences in protocols and cell lines. Thus, while there are broad changes in maturation between, for example, D9-12 vs D45-D100, cells from different datasets will proceed at slightly different rates. More clear differences across timepoint can be seen by looking at each study individually. We also agree with the observation by the reviewer that the range of maturation in PSC-CMs (Fig. 5a) and reprogrammed CMs (Fig. 5b) is minimal. This is, we believe, a key finding from the study. It is an assumption that PSC-CMs are undergoing a maturation process in vitro -however, our belief is that any maturation process that is happening is exceedingly minimal, particularly in comparison to the scope of changes that are happening in vivo. Even late-stage PSC-CMs fail to achieve greater than a fetal-like entropy score, and appear arrested at the onset of the in vivo perinatal phase. This observation is consistent with data from numerous other studies, and indeed underlies the observation that PSC-CMs are immature. In our recent study (8), we probed this observation more thoroughly, and found that mouse and human PSC-CMs generally fail to undergo the perinatal CM maturation programs that happen in vivo. We believe the data in this manuscript are consistent with these findings.
Regarding the color codes in the two mentioned figures -the colors simply reflect the source of the data (e.g. which study). The purpose of color-coding in this manner was to demonstrate consistency of entropy scores across studies, particularly in vivo.
Why is the entropy not compared between the Kannan dataset and Wang and Yao? This would prove that indeed entropy is a good measure as opposed to UMAP+monocle.
We apologize for the confusion here -both of these datasets are included in Fig. 4b. We show a comparison below here as well (Fig. SR6). Given that we have observed maturation states to overlap significantly from e18 -p4 (8,16), we believe these results match our expectations.