Fig 1.
Schematic of Perturbation-Expression Analysis of Cell States (PEACS).
The PEACS pipeline consists of four steps: (1) perturb a heterogeneous population with a genetic or chemical perturbation, (2) profile gene expression, (3) factor the gene-expression matrix, and (4) analyze the factored gene-expression matrix to identify perturbations that altered cell-state proportions. Shown in the lower right is the SVD formula for matrix factorization.
Fig 2.
Comparison of SVD/PCA, NMF and ICA in an idealized experiment with defined cell-state proportions.
(A) Three different cell lines T47D, MDA231, and SUM159 representing ‘cell states’ A, B, and C, respectively, were mixed in varying ratios. These heterogeneous mixtures were expression-profiled using qPCR, and SVD/PCA, NMF, and ICA were used to factor the resulting gene-expression matrix. For each heterogeneous population the first and second components obtained by the SVD/PCA (B), NMF (C), and ICA (D) factorization algorithms were plotted against the fraction of cells in State A or B, respectively; the squared correlation coefficient (r2) is shown for each plot. (B-D, plot on right) For each algorithm the data were plotted in Component 1 vs Component 2 space with the replicates having the same color and connected by lines. (E) Heatmap of gene expression in each of states A, B and C. Data were log-transformed and mean-normalized by row; a red/green color scale is shown. Also shown are the genes with the highest loadings, both positive (green) and negative (red), in the first two components identified by the SVD and NMF factorizations. The heatmap shows that SVD component 1 identified genes strongly up or down in state A, while SVD component 2 found the two genes strongly differentially expressed between states B and C. In contrast, both NMF components 1 and 2 identified genes that were unique to state A, being down and up respectively in state A.
Fig 3.
Schematic showing the steps in the calculation of a PEACS score.
Step (1): calculate a median centroid for all perturbations (green star) in a given experiment. Step (2): calculate a centroid of a set of replicates for a single perturbation (blue star). Step (3): calculate the Euclidean distance (d) between the median centroid of all perturbations and the centroid of a set of replicates for a single perturbation. Step (4): calculate the standard error (S.E.) about the mean for a set of replicates for a single perturbation. For each set of perturbation replicates, the PEACS score is defined as the distance calculated in Step 3 divided by the S.E. calculated in Step 4. In the formula shown k is the number of SVD principal components, which is determined by a Scree plot; repl is the number of replicates for a given perturbation; is the coefficient of the ith SVD gene-expression vector for the jth perturbation; and
is the average coefficient across all perturbation samples for the ith SVD gene-expression vector.
Fig 4.
The MCF10A stem cell model exhibits multi-lineage mammary differentiation.
(A, upper panels) Confocal microscopy images of MCF10A collagen cultures stained with phalloidin (red) and DAPI (blue) 8 days after seeding. (A, lower panels) The images were segmented into ductal and lobular structures using CellProfiler and quantified (B). (C) 3D confocal reconstruction of a complex ductal-lobular tissue rudiment 12 days after seeding.
Fig 5.
PEACS identifies RUNX1 as a candidate regulator of mammary cell state.
PEACS was applied to the data matrix generated by gene-expression profiling populations of MCF10A cells perturbed by inhibiting transcription factor expression with lentiviral shRNAs. (A, left) The five genes with the highest PEACS scores are plotted as red ellipsoids centered at the coefficients for the first three SVD components, averaged across the shRNA replicates targeting the gene. The lengths of the three ellipsoid axes are the standard errors in the three respective SVD components. As a control, also shown is a purple ellipsoid centered at the first three SVD components averaged across fifteen shRNAs that failed to inhibit their target; in this case the lengths of the three ellipsoid axes are the standard deviations for the respective SVD components. (A, right) Three genes with non-significant PEACS scores (USF1, HIF1A, and NFE2L1) were plotted as green ellipsoids centered at the coefficients for the first three SVD components, averaged across the shRNA replicates targeting the gene; the three axis lengths for the ellipsoids are the standard errors in the respective SVD components. The SVD component coefficients for the remaining non-significant genes were plotted as black dots, most of which fall within the purple control ellipsoid. (B) Density estimation and (C) p-values of PEACS scores demonstrate that a few perturbations robustly affect SVD components and are statistically significant outliers. P-values were calculated by Monte Carlo resampling (see Results and Methods). † indicates Bonferroni-corrected p<0.05; * indicates nominal p<0.01 (D) Since perturbation of RUNX1 primarily affected SVD1, a heatmap was used to compare the expression profile of RUNX1-inhibited cells to the expression profile of cells in which genes predicted to have no effect on SVD1 were inhibited; in all cases the expression shown is averaged across all shRNAs targeting the given gene. The genes with the highest loadings in SVD component 1 are shown to the right of the heatmap, with strongly positive loadings shown as green, and strongly negative loadings shown as red.
Fig 6.
RUNX1 is required for mammary stem cells to differentiate into tissue rudiments in 3D culture.
(A) Brightfield images of day 8 shRUNX1 (sh1, sh2, sh3) or control (shLuc) MCF10A cells seeded into collagen culture. % KD denotes the percent of expression lost relative to shLuc control as quantified by qPCR. (B) Western blots quantifying the percentage of expression lost relative to shLuc control at the protein level. (C) Quantification of tissue rudiments formed by shRUNX1 and shLuc cells. CellProfiler was utilized to quantify total ducts and lobules as well as ductal features (long-axis length, structure area, structure perimeter). Error bars denote SEM from 10 fields per condition. (D) Confocal microscopy of tissue rudiments of shLuc and shRUNX1 cells, taken 8 days after seeding into collagen. Dashed line indicates the hollowed region of a mature lobule.
Fig 7.
RUNX1 is necessary for mammary stem cells to exit the bipotent state.
(A) Experimental schematic and brightfield images of MCF10A cells stably transduced with a dox-inducible RUNX1 shRNA, grown in collagen culture for 7 days in the presence of dox. At day 7 half of the collagen gels were removed from dox, and the other half maintained in dox for an additional 4 days. Tissue rudiments maintained in dox remained as solid spheres, whereas spheres removed from dox rapidly sprouted ducts within 12–24 hrs. (B) Western blot confirming inducible RUNX1 inhibition by the dox inducible shRNA. MCF10A cells were cultured without dox for 7 days (lane 1), or with dox for 4 days followed either by culture without dox for an additional 3 days (lane 2) or culture with dox for an additional 3 days (lane 3). Also shown is quantification of the western blot normalized to GAPDH and the no dox control treatment, and quantification using qPCR. (C) Experimental schematic, brightfield images, and quantification of MCF10A dox-inducible shRUNX1 cells that were grown in the presence of dox for seven days, removed from collagen, dissociated into single cells, and reseeded into a new collagen pad in the presence or absence of dox. While control cells are unable to reseed structures, inducible shRUNX1 cells are able to reseed structures with high efficiency (reseeding capacity shown as structures formed per 7500 cells; * indicates p<0.05 relative to wild type by t-test. SEM is indicated n = 3). Dox-inducible shRUNX1 cells grown in the absence of dox are multipotent, with the capacity to form ducts, lobules, and complex ductal-lobular structures in 13 days.
Fig 8.
RUNX1 inhibition blocks the differentiation of patient-derived mammary stem cells.
(A) Schematic of the human breast progenitor cell colony formation assay. Human primary organoids were dissociated into single cells and plated onto culture plates. (B) Brightfield and fluorescent images of CK14 and CK8/18 stained human progenitor cell colonies: stem cell colonies are small and consist of cells dual positive for CK8/18 (brown;red) and CK14 (red/purple; green); differentiated colonies are larger and have domains that are singly positive for CK8/18 and CK14. Quantification of progenitor cell colonies from control, shRUNX1 or RUNX1 overexpressing cells; (C) Immunofluorescence and quantification of heterovalent colonies formed by primary human cells infected with a dox-inducible shRUNX1 in the presence of dox for 7 days. After 7 days, dox was removed and the colonies were grown for an additional 96 hours. These cultures generated heterovalent colonies—defined as colonies containing both stem and differentiated cell types. Shown is a heterovalent colony containing both CK14,CK8/18 double-positive cells, and cells that only stained for either CK14 or CK8/18. The fraction of heterovalent colonies in each condition is shown. * indicates p<0.05 versus wild type; error bars are SEM with n = 4.