Table 1.
Comparison of different computational methods for GRN inference, highlighting their input data type, requirement of prior information, capabilities of generating new data, implementation of multiple gene knockouts (KO) simulations, and ability to infer directed interactions. An asterisk (*) indicates that the feature is theoretically supported by the method but was not implemented and discussed in the referenced publication.
Fig 1.
IGNITE framework and mouse scRNA-seq dataset analysis.
A. Overview of the IGNITE workflow: from the scRNA-seq input, through processing, modelling, and data generation, to the final outputs: (i) effective inferred Gene regulatory Network (GRN), (ii) wild-type generated gene activity data, and (iii) gene activity generated using inferred GRNs with gene perturbations. QC indicates Quality Control. B. Studied biological system: temporal dynamics of PSCs differentiation, encompassing the transition through distinct stages from naïve to formative. The considered transcriptomic dataset is 10X scRNA-seq starting in 2i + LIF (2iL). The points represent 5 time points at which specific cells were sampled and measured after the removal of 2iL. C. Average gene expression (GE) z-score values from the log-normalized scRNA-seq input dataset. For each gene, the average is computed for all cells that belong to the same time point. We considered 24 genes, divided into four different gene categories, naïve (light blue), formative early (yellow), formative late (red), and others (grey). We classified these genes following Carbognin et al. [14]. D. Two-dimensional UMAP visualisation of the log-normalized scRNA-seq dataset, with each point representing an individual cell. The gradient colour scale corresponds to the pseudotime value of each cell, indicating the time progression within the dataset. The pseudotime ranges from the value of 0 to the final value of 38.22. UMAP1 and UMAP2 indicate the two dimensions of the UMAP space. E. UMAP visualisation of three key genes (Klf4, Otx2, Utf1) in the log-normalized scRNA-seq dataset. Each point represents an individual cell, coloured by the normalized expression of the respective gene. F. Mini-Bulk gene expression of Klf4, Otx2, and Utf1 along pseudotime (PST). The red dashed line indicates half-maximum expression.
Fig 2.
Selection of the best-performing mouse GRN.
A. Density plot of the Correlation Matrices Distance (CMD) values for the 250 inferred GRNs. The red dashed line represents the selected GRN with the lowest CMD. B. Pearson correlation matrix for the gene activity (GA) of the input dataset (scRNA-seq data with LogNorm, PST, and MB). C. Pearson Correlation matrix for IGNITE-generated gene activity dataset. D. Hierarchical clustering of gene activity of the input dataset. The clustering algorithm used is Ward’s method. Each row represents a gene, while each column corresponds to an individual cell. The color indicates inactive (−1, yellow) or active (+1, blue) gene activity. The dataset has 9547 cells. E. Hierarchical clustering of gene activity of IGNITE-generated data. Methodology for visualisation as in Fig 2D. 9547 cells were simulated. F. Principal Component Analysis (PCA) scatter plot representing the gene activity, GA, for the input dataset (scRNA-seq data with LogNorm, PST, and MB). Each point corresponds to a single cell, and the colour intensity reflects the pseudotime value of the cell. PC1 and PC2 indicate the two dimensions of the PCA space. G. PCA scatter plot representing the IGNITE-generated wild-type gene activity data, WT GA, using the same dimensional reduction approach as in Fig 2F.
Table 2.
Spearman correlation (Spearman rank correlation test) between predicted and experimental KO–WT average gene activity changes for each method. Values in parentheses indicate the associated p-values in scientific notation. Statistically significant correlations (p < 0.05), whether positive or negative, are shown in bold.
Fig 3.
IGNITE predictions of single and triple KO perturbations in mPSCs and comparison with experimental benchmarks.
A. Scaled KO–WT difference for Rbpj, Etv5, and Tcf7l1 single knockouts, computed from experimental data [39] and from simulations with IGNITE, SCODE, and CellOracle. For each gene, the simulated scaled KO–WT difference was calculated as the scaled difference between the average fraction of active cells in wild-type and knockout conditions. For the experimental data, scaled log2FC values from [39] were used. All quantities were scaled between −1 and +1 to facilitate comparison across datasets (see Methods for details). B. Scaled KO–WT difference for the triple knockout, computed from experimental data [38] and from simulations with IGNITE, SCODE, and CellOracle. The simulated scaled KO–WT difference was calculated as the scaled difference between the average fraction of active cells in wild-type and knockout conditions. For the experimental data, scaled log2FC values from [38] were used. All quantities were scaled between −1 and +1 to enable comparison across datasets (see Methods for details). C. Hierarchical clustering of the IGNITE-generated triple KO GA. The clustering algorithm used is Ward’s method. Each row represents a gene, while each column corresponds to an individual cell. The color indicates inactive (−1, yellow) or active (+1, blue) gene activity. 9547 cells were simulated. D. PCA scatter plot representing the gene activity, GA, of the input dataset (scRNA-seq data with LogNorm, PST, and MB) and the IGNITE-generated triple KO GA. The colour gradient represents the cell density within each square area, with separate scales for the input (blue) and the generated triple KO cells (orange). E. Spearman’s correlation between simulated and experimental scaled KO–WT differences for Rbpj, Etv5, Tcf7l1, and the triple KO. Results are based on the 10 GRNs inferred with IGNITE that exhibited the lowest CMD values (out of 250 tested models). Boxes indicate the interquartile range, the horizontal line marks the median, and whiskers represent the full range of non-outlier data.
Table 3.
Fraction of Agreement (FoA) between predicted and experimental KO–WT gene activity changes for each method. Statistically significant FoA values (p < 0.05, binomial test) are shown in bold.
Fig 4.
GRN inferred by IGNITE and comparison with literature-supported interactions.
A. Interaction matrix of the GRN inferred with IGNITE from the input dataset (log-normalized scRNA-seq with PST and MB). Genes are grouped as in Fig 1C: naïve (blue), formative early (yellow), formative late (red), and others (grey). Matrix elements (i,j) represent the interaction from regulator j to target i, with positive (red) and negative (blue) values. B. Subnetwork of 18 interactions reported in the literature. Only the involved genes are shown, coloured as in A. Arrows denote activating (red) or inhibitory (blue) interactions. C. Subnetwork of literature-reported interactions correctly recovered by IGNITE. The arrows represent the interactions that IGNITE inferred with the correct sign.
Table 4.
Comparison of the 4 considered inference methods using as input the mouse scRNA-seq dataset (log-normalized and processed with PST and MB). The table reports (i) the fraction of correctly inferred known interactions (FCI) and (ii) the Spearman correlation (with associated p-value) between the interaction strengths inferred by IGNITE and each other method.
Fig 5.
Comparison of CMD and FCI selection criteria in mouse GRN inference.
A. Relationship between the fraction of correctly inferred interactions (FCI) and the Correlation Matrices Distance (CMD) for the 250 GRNs inferred with IGNITE from the input dataset (log-normalized scRNA-seq with PST and MB). Each point is a GRN, and it corresponds to a specific set of hyperparameters. The 10 models with the lowest CMD are highlighted in red, and the 10 with the highest FCI in orange.
Table 5.
Spearman correlation (mean ± SEM) between simulated and experimental KO–WT differences for the ten GRNs with lowest CMD or highest FCI. P-values are from one-sided t-tests against zero, with FDR correction across the four KO conditions. Significant results are marked with stars (* p < 0.05, ** p < 0.01, *** p < 0.001).
Table 6.
Spearman correlation (mean ± SEM) between simulated and experimental KO–WT differences for the best GRN inferred from 500 randomly subsampled cells. P-values from one-sided t-tests against zero with FDR correction across KO conditions.
Fig 6.
Human PSC dataset preprocessing.
A. Studied biological system [31]: temporal dynamics of human PSC differentiation, spanning pluripotency, mesendoderm, and definitive endoderm. The dataset is based on Smart-seq scRNA-seq. Cells were sampled at six time points: 0h (n = 92), 12h (n = 102), 24h (n = 66), 36h (n = 172), 72h (n = 138), and 96h (n = 188). B. PCA of the log-normalized dataset. Each point represents a cell, coloured by clustering label. C. PCA of the dataset, with cells coloured by pseudotime. The pseudotime ranges from the value of 0 to the final value of 183.58. D. PCA visualisation of three key genes (Pou5f1, T, and Gata4) in the log-normalized scRNA-seq dataset. Each point represents an individual cell, coloured by the normalized expression of the respective gene. E. Mini-Bulk gene expression patterns in pseudotime for three representative genes, Pou5f1, T, and Gata4.
Fig 7.
Inference and validation of IGNITE in hPSCs: GRN and generated data.
A. Pearson correlation matrix of gene activity in the input dataset (human PSCs differentiation scRNA-seq data with LogNorm, PST, and MB). B. Pearson Correlation matrix of the gene activity generated with IGNITE. C. PCA of gene activity for three datasets: input, from scRNA-seq data with LogNorm, PST, and MB (left), IGNITE-generated wild type (middle), and IGNITE-generated knockout of POU5F1 (right). The colour gradient represents the cell density within each square area. Dashed contours denote cluster boundaries defined on the input (left) and reused for the generated datasets. D. Fraction of cells assigned to clusters (1–5) or excluded areas, for the same three datasets shown in C. For generated data, bars show means across replicated simulations (150 simulated datasets; error bars indicate SEM). Bar colours match the clusters in C.
Fig 8.
Evaluating IGNITE predictions of gene knockouts in hPSC differentiation.
A. Difference in cluster composition between WT- and KO-generated data, averaged across 150 replicates. Rows correspond to simulated single-gene KOs, columns to clusters (1–5) or excluded cells. Rows were ordered by hierarchical clustering, which also identified 11 groups shown by colour. Red indicates an increase after KO, blue a decrease. B. PCA of cluster composition across conditions: WT input, WT-generated, and each single-gene KO. Each point corresponds to one condition and is positioned by the fractions of cells across clusters (1–5) and excluded cells (fractions for generated conditions are averaged over 150 replicates). Colours indicate the groups defined in Fig 8A. KO conditions corresponding to the genes shown in A are indicated. C. Differentiation score for selected KO genes, chosen to overlap with those analysed in [31], computed as described in Methods. The axis break highlights the outlier magnitude of POU5F1.
Table 7.
Set of hyperparameters values for the random search. The momentum parameter is used only if the MGA optimiser is selected.