Molecular Predictors of 3D Morphogenesis by Breast Cancer Cell Lines in 3D Culture

Correlative analysis of molecular markers with phenotypic signatures is the simplest model for hypothesis generation. In this paper, a panel of 24 breast cell lines was grown in 3D culture, their morphology was imaged through phase contrast microscopy, and computational methods were developed to segment and represent each colony at multiple dimensions. Subsequently, subpopulations from these morphological responses were identified through consensus clustering to reveal three clusters of round, grape-like, and stellate phenotypes. In some cases, cell lines with particular pathobiological phenotypes clustered together (e.g., ERBB2 amplified cell lines sharing the same morphometric properties as the grape-like phenotype). Next, associations with molecular features were realized through (i) differential analysis within each morphological cluster, and (ii) regression analysis across the entire panel of cell lines. In both cases, the dominant genes that are predictive of the morphological signatures were identified. Specifically, PPARγ has been associated with the invasive stellate morphological phenotype, which corresponds to triple-negative pathobiology. PPARγ has been validated through two supporting biological assays.


Thresholding as a mean for segmentation
Gabor filters eliminate the need for threshold selection and complexities that may arise because of contrast reversal with phase contrast microscopy. Figure 1 shows three examples of thresholding artifacts in our data sets. However, by utilizing Gabor features, these artifacts can be eliminated.

Background on Zernike Polynomial
The Zernike polynomials V mn (x, y) are a set of orthogonal functions that satisfy x y V mn (x, y) * V kl (x, y)dxdy = m + 1 π δ mk δ nl , x 2 + y 2 ≤ 1, where δ mk is 1 if m = k, and 0 otherwise. Zernike polynomials expressed in polar coordinates (ρ, θ) are defined as where The significance of such a representation is that they provide a translation and rotation invariant measure to encode inherent morphometric properties.

Molecular predictors of morphological clusters based on nonlinear method
In the non-linear case, the .632+ bootstrap error [1] of the SVM rule with Gaussian kernel is used for identifying differentially expressed genes. Bootstrap is a resampling method for model selection and validation that is shown to perform well for small sample sizes by correcting the bias against sample selection. As discussed by Ambroise and McLachlan [1], the .632+ bootstrap error is estimated by where E resub is the proportion of original cell lines misclassified by the SVM rule R, constructed from data associated of all cell lines (i.e., the entire data set is used for training); E bs is the leave-one-out bootstrap error rate for predicting the classification error of a specific cell line, which is not included in the bootstrap samples; and w is the weight. Suppose that K bootstrap samples of size n are obtained by re-sampling with replacement from the original N cell lines of known cluster labels. The re-sampling scheme is designed in such a way that each bootstrap sample contains the same number of cell lines from each morphological cluster. E bs in Eq. (4) is then estimated by where O ik is 0 if the ith cell line exists in the kth bootstrap sample and is 1 otherwise. E ik = 1 if the SVM rule R k , formed from the kth bootstrap sample, misclassifies the ith cell line, and equals 0 otherwise. The weight w in Eq. (4) is defined by is the relative overfitting rate and γ is the no-information error rate, which is estimated by where c is the number of classes or clusters, p i is the percentage of the cell lines from the ith class with respect to the entire population, and q i is the correct recognition rate as measured by the SVM rule R.
The top genes selected to predict the stellate cluster based on .632+ bootstrap error of SVM with Gaussian kernel are listed in Tables 1, with annotations.

Molecular predictors of morphological clusters based on GSEA
We run GSEA on the gene expression data with the label of stellate vs. round/grape-like. Table 2 shows gene sets (gene ontology terms) enriched in the stellate cluster based on the GSEA results. PPARG appears in 4 of the most enriched gene sets.