Determining Physical Mechanisms of Gene Expression Regulation from Single Cell Gene Expression Data
Fig 3
In silico validation of single cell analysis pipeline.
In order to estimate the accuracy of the new tools developed in this paper, each tool was evaluated against simulated datasets. (A-C) The kinetic parameter estimation method was tested against simulated data in which 90% of the data was randomly discarded, to simulate the loss of biological material through inefficient cDNA preparation. The known kinetic parameters and the estimated kinetic parameters were compared for Kon (A), Koff (B) and Kt (C). (D-F) Next, SABEC was tested against simulated datasets that had the same kinetic parameters as those estimated for the cell populations in the Moignard single cell dataset (124 simulated cells for each of the 5 cell populations)—SABEC was tested on 100 of such simulated datasets and the robustness of the clustering was measured by calculating the proportion of ambiguously clustered cells (PAC). This figure depicts a hierarchical clustering of the consensus matrices that come from (D) the dataset whose clustering had the worst PAC score, (E) a randomly selected dataset, and (F) the dataset whose clustering had the best PAC score. The coloured bars along the side and bottom represent the true class labels of each cell. (G-H) Next, the true positive and false positive rates were calculated for each proposed component of EPiK, the union of the MP and subset method, and the intersection of all three methods. For the MP and Subset methods, thresholds were set so the false positive rate would be approximately 2%– receiver operating characteristic (ROC) curves for these are found in S13 and S14.