Fig 1.
The analytic pipeline consists of three main components: pre-processing, cell type identification, and cell type specific gene signature and driving force identification.
Fig 2.
Identification of Major Lung Cell Types.
Cells (n = 148) from two sample preparations from fetal mouse lung at E16.5 were assigned into 9 clusters via hierarchical clustering using average linkage and centered Pearson’s correlation. Each color represents a distinct cell cluster, labeled as C1-C9. The rectangles represent single lung cells from the first preparation and the ellipses consist of single cells from a second independent preparation. Connection lines indicate the z-score correlation between the two cells > = 0.05. The blue lines connect cells within the same preparation, while the red lines connect cells across preparations.
Fig 3.
Validation of Cell Type Assignments using Known Biomarkers.
(A) Expression patterns of representative known cell type markers were used to validate the correct assignment of major lung cell types at E16.5. Expression levels were normalized by per-sample z-score transformation. (B) ROC curves of the rank-aggregation-based validation showed a high consistency (AUC>0.8) between the cell type assignments and the expression patterns of known cell type specific markers (S2 Table).
Fig 4.
Prediction of Cell Types for Each Cluster using Cell Type Enrichment Analysis.
Information on gene expression in certain cell types were downloaded from EBI Expression Atlas (http://www.ebi.ac.uk/gxa). Results were obtained using differentially expressed genes as the input gene lists. The lengths of the bars represent transformed p-value (−log10 (p)) of highly enriched cell types for each cell cluster, where p is the p-value calculated by one-tailed Fisher’s exact test and represents the degree of a cell type enrichment in a given cell cluster.
Fig 5.
Predicted Signature Genes for Major Lung Cell Types.
(A) Heatmap shows that the predicted cell type specific signature genes are selectively expressed in defined cell types. Gene expression was per sample z-score normalized. (B) The top 20 signature genes based on the ranking scores for each lung cell type are listed. Genes in red are the known markers that were used to train the signature prediction models.
Fig 6.
Mouse Lung Epithelial Specific Transcriptional Regulatory Network.
(A) Rank importance of transcription factors (TFs) in the main connected component of epithelial specific transcriptional regulatory network (TRN). The sizes of the TF nodes are proportional to their average-ranked node importance. The main connected component of epithelial TRN is comprised of 348 nodes and 432 edges. The nodes in red are the TFs and the nodes in grey are differentially expressed genes (p-value<0.01) in epithelial cells and are not TFs. The edges were established using the first-order conditional dependence approach described in the Methods section with a cutoff at 0.05. (B) The Hopx local network (the first hop is shown). Hopx was the top ranked TF identified by driving force analysis (Table 1).
Table 1.
Top 20 Predicted Key Transcription Factors for Lung Epithelial Cells at E16.5.
Table 2.
Top 20 Predicted Regulatory Targets of Nkx2-1 Identified from a Consensus among Expression based Prediction, ChIP-seq, and Literature Evidence.