Table 1.
A list of search parameters that can be passed to queryTME in order to filter the available datasets.
Fig 1.
A visualisation of the various tissue types included in TMExplorer.
TMExplorer includes 48 TME scRNA-seq datasets from 26 different human cancer types from 13 different sites and 4 different mouse cancer types. TMExplorer is generalizable and extendable, and the new datasets are added to the database as they become available. Fig 1 is created with BioRender.com.
Table 2.
List of tumor microenvironment scRNA-seq datasets included in TMExplorer.
Fig 2.
The format of the SingleCellExperiment objects containing TME datasets.
The Assay is a matrix or dgCMatrix containing the gene expression table, named according to the type of score (i.e. an Assay containing raw counts would be named “Counts”); colData is a DataFrame with the number of rows equal to the number of columns in the Assay and describes the cells in the dataset; Metadata is a named list of additional metadata objects describing the dataset. A SingleCellExperiment object may contain one or more AltExps, which are nested SingleCellExperiment objects containing a different score type in the Assay.
Fig 3.
An overview of the main functions of TMExplorer.
A. queryTME allows users to search and return datasets in either a descriptive table or as a list of SingleCellExperiment objects for analysis. B. saveTME allows users to write datasets to disk. For each dataset written to disk, up to three files are created; a table storing the expression data as either a CSV or matrix market file, depending on whether a dense or sparse matrix is passed to the function; a table containing the cells and their truth label, if available; and a table containing the cell type signature gene sets, if available.
Fig 4.
An example workflow of using TMExplorer to obtain datasets for the downstream analysis using Python and R.
Users start by using queryTME to return all datasets that have cell type labels and cell type signature gene sets, which will get a list of matching datasets contained in SingleCellExperiment objects. Then, for R based algorithms, users can pass the SingleCellExperiments directly if that is supported, or users can pass the individual components required. For Python based algorithms, saveTME can be used to save the files for each dataset to disk, which can then be opened in Python for analysis.
Fig 5.
A summary of TMExplorer contents.
Here, we provide a summary of the number of humans and mice datasets in TMExplorer (A); the number of datasets generated by various sequencing technologies (B); the number of datasets for which cell type labels and gene signatures are available (C); and the distributions of score types of different datasets (D) and the tumour types (E). In addition, boxplots of the number of cells, genes, tumours and patients across different datasets are provided (F).
Fig 6.
A flowchart of data query and analysis using TMExplorer.
TMExplorer provides a search and analysis capability, where users can look up and return their datasets of interest, view the expression matrix, cell type labels and metadata including gene signatures (if available) and continue by either using R for data visualization and analysis, or save the datasets in CSV format to be analyzed by their programming language of choice (e.g. Python).
Fig 7.
A case study on using TMExplorer to identify cell types.
A case study showing how TMExplorer can be used in order to obtain datasets for cell cluster labelling via Seurat and GSVA. queryTME can be used to return those datasets which have both gene signatures and cell type annotations required for testing the automated identification of cell types. The expression data can be passed to Seurat for cell clustering, and the gene signatures can be used by GSVA to identify the cell types in Seurat’s clusters. Finally, the cell type annotations can be used as the truth labels to measure the performance of the results obtained by Seurat clustering followed by GSVA.
Fig 8.
A case study on using TMExplorer for inferring CNVs.
A case study showing how TMExplorer can be used to obtain multiple datasets for a specific tumour type, to be used with CNV-based separation methods, such as CONICSmat. QueryTME returns datasets of a specific tumour type, such as Glioblastoma. These datasets can then be inputted directly into large-scale CNV inferencing methods, such as CONICSmat.