IQCELL: A platform for predicting the effect of gene perturbations on developmental trajectories using single-cell RNA-seq data

doi:10.1371/journal.pcbi.1009907

Fig 1.

Overview of IQCELL.

IQCELL infers logical GRNs directly from sc-RNA seq data and allows the simulation and analysis of in silico developmental trajectories in normal and perturbed conditions. The typical inputs of IQCELL are sc-RNA seq expression data along with the pseudo-time ordering of the cells. After correction of dropout effects and gene selection steps, gene-gene interactions are calculated and weighted based on mutual information. Binarized gene expression values are used to constrain possible gene-gene interactions and obtain a functional GRN for the data. IQCELL can be used to analyze the GRN and simulate possible developmental trajectories under normal and perturbed conditions.

More »

Expand

Fig 2.

IQCELL initial processing of early T-cell development sc-RNA seq data.

(A) Summary of the scope of the sc-RNA seq data used as an input to IQCELL [18]. ETPs originated from pre-thymic progenitors progress toward DN2A, DN2B (coincides with upregulation of Bcl11b and lineage commitment), DN3 stages and eventually lead to DP cells (not covered here). (B) Log transformed expression matrix for selected genes from sc-RNA data along the pseudo-time axis. Gene expression is corrected for dropout effects using MAGIC [24]. Red indicates high expression, blue indicates low expression. (C) Smoothed binarized gene expression matrix (expression density). Gene expression values were binarized by clustering, averaged over a pseudo-time window, then sorted based on transition points from early to late. Red indicates high expression, blue indicates low expression. (D) The set of all possible gene-gene interactions, filtered by interaction hierarchy and mutual information. Positive and negative interactions are represented by blue and red edges, respectively. Edge width represents the relative amount of mutual information of the interaction.

More »

Expand

Fig 3.

The provisional GRN for mouse early T-cell development inferred by IQCELL captures essential gene interactions and accurately simulates T-cell developmental trajectories.

(A) The provisional GRN for early mouse T-cell development. The GRN is obtained by constraining the possible interactions to both follow the in vitro data progression when executed as a logical network and maximize mutual information between gene pairs. Positive and negative interactions are represented by blue and red edges, respectively. (B) Out of 38 experimentally reported gene interactions of early mouse T-cell development [16], 29 of them are captured by the functional GRN model proposed by IQCELL. (C) Detailed representation of the proposed interactions by IQCELL and experimentally reported ones. Rows and columns represent regulators and effector genes, respectively. Blue indicates that the interaction is captured by the model directly (dark blue) or indirectly (light blue); in the latter case, the numbers indicate the number of intermediate genes. Dark gray indicates that the interaction is only proposed by IQCELL. The red color indicates the experimentally validated interaction is not present in the model. Light gray cells indicate no interaction. Genes downstream of Spi1 comprise 50% of the experimentally-reported interactions not captured by IQCELL. (D) The PCA plot of the binarized scRNA-seq data color-coded with the pseudo-time values attributed to each cell. The binarization is performed by clustering the scRNA-seq expressions into expressed or not expressed levels. On top of that, the binarized expressions of CLP, ETP, DN2A, DN2B, and DN3A cells have been calculated from the Immgen microarray data [38] and overlaid on RNA-seq data. (E) The four initial states that have been used in simulations. Three variations of the state representing ETP are due to the noisy expressions of Notch1 and Hes1 genes in recovered sc-RNA seq data with early pseudo-time. Genes that are expressed (1) and not expressed (0) are represented with blue and grey circles, respectively. (F) The PCA plot of the simulated developmental trajectories are overlaid on the binarized scRNA-seq. The two detected attractors are colored red, and the attractor that matches the DN3A state is marked by star (*). The simulated data is color coded by the value of average simulation step (average distance to the attractor of simulation). (G) Average gene expression at each simulation step. All simulations started from the same initial condition (ETP) and move toward the same attractor (*). (H) Expression states of the GRN model steady state attractors. Genes that are expressed (1) and not expressed (0) are represented with blue and grey squares, respectively. (I) Percentage of similarity between the two attractors (vertical axis) and binarized microarray expression profiles of CLP, ETP, DN2A, DN2B, and DN3A cells (horizontal axis) [38]. The average agreement between two random states is 50%.

More »

Expand

Fig 4.

Testing the known effect of eight gene perturbations on in silico developmental trajectories.

(A) Schematic of performed gene perturbations. In OE, the gene is always expressed (represented with 1) and in KO the gene is always silent (represented with 0). (B) PCA plot of the simulated developmental trajectories under perturbed conditions are overlaid on the binarized scRNA-seq. The perturbations include KO of Notch1, KO of Tcf7, KO of Bcl11b, KO of Runx1, KO of Tcf12, KO of Myb, OE of Tcf12 and the double perturbation, OE of Tcf7 and KO of Notch1 at the same time. (C) Expression states of the model attractors under perturbations. Genes that are expressed (1) and not expressed (0) are represented with blue and grey squares, respectively. (D) Percentage of similarity between the model attractors under perturbations (vertical axis) and the binarized expressions of CLP, ETP, DN2A, DN2B, and DN3A cells (horizontal axis) [38] (left). Description of known effect of the gene perturbation on T-cell development (right).

More »

Expand

Fig 5.

Constructing mouse early T-cell GRN based on automated TF selection pipeline.

(A) Overview of TF selection procedure. After selecting HVGs, IQCELL uses pySCENIC to select active regulons (TFs and their effectors), and finally IQCELL uses GDS to rank and select TFs for the final list. We have added Notch1 for its known biological importance and Rag1 and Cd3g as biological markers of DN3 stage to the list. (B) Comparison of automated vs curated TF selection show that the TF selection pipeline captures 8 of 14 genes in the curated gene list. (C) The PCA plot of the binarized scRNA-seq data color-coded with the pseudo-time values attributed to each cell. The binarization is performed by clustering the scRNA-seq expressions into expressed or not expressed levels. (D) The PCA plot of the simulated developmental trajectories are overlaid on the binarized scRNA-seq. The detected attractor is colored red and marked by star (*). The simulated data is color coded by the value of average simulation step (average distance to the attractor of simulation). (E) Expression states of the GRN model steady state attractor. Genes that are expressed (1) and not expressed (0) are represented with blue and grey squares, respectively. (F) Percentage of similarity between the attractor (vertical axis) and binarized microarray expression profiles of CLP, ETP, DN2A, DN2B, and DN3A cells (horizontal axis) [38]. The average agreement between two random states is 50%. (G) Comparison of gene perturbations between automated and curated GRNs show 17 matches of the predicted sates out of 18 perturbations.

More »

Expand

Fig 6.

Constructing mouse erythropoiesis GRN.

(A) Summary of the scope of the sc-RNA seq data used as an input to IQCELL [17]. Erythroid progenitors (ErP) arise from megakaryocyte/erythroid progenitors (MEPs). (B) Log transformed expression matrix for selected genes from sc-RNA data along the pseudo-time axis. (C) The constructed GRN for mouse erythropoiesis. The GRN is obtained by constraining the possible interactions to both follow the in vitro data progression when executed as a logical network and maximize mutual information between gene pairs. Positive and negative interactions are represented by blue and red edges, respectively. (D) Out of 16 known interactions [15], 11 were captured by IQCELL (top). Detailed representation of the proposed interactions by IQCELL and experimentally reported ones. Rows and columns represent regulators and effector genes, respectively. Blue indicates that the interaction is captured by the model directly (dark blue) or indirectly (light blue); in the latter case, the numbers indicate the number of intermediate genes. Dark gray indicates that the interaction is only proposed by IQCELL. The red color indicates the experimentally validated interaction is not present in the model. Light gray cells indicate no interaction (bottom). (E) The PCA plot of the simulated developmental trajectories are overlaid on the binarized scRNA-seq (the curated genes). The detected attractor is colored red and is marked by star (*). The arrow (blue-red) represent the direction of inferred pseudo-time. (F) Percentage of similarity between the model attractors under perturbations (vertical axis) and binarized expression of MEPs and ErPs (horizontal axis) (left). Description of known effect of the gene perturbation (right). (G) Comparison of automated vs curated TF selection show that the TF selection pipeline captures 5 of 7 genes in the curated gene list. (H) The PCA plot of the simulated developmental trajectories are overlaid on the binarized scRNA-seq (the genes resulting from automated TF selection). The detected attractor is colored red and is marked by star (*). The arrow (blue-red) represent the direction of inferred pseudo-time. (I) Comparison of gene perturbations between automated and curated GRNs show 8 matches of the predicted states out of 10 perturbations.

More »

Expand