Decomposition of Gene Expression State Space Trajectories

Representing and analyzing complex networks remains a roadblock to creating dynamic network models of biological processes and pathways. The study of cell fate transitions can reveal much about the transcriptional regulatory programs that underlie these phenotypic changes and give rise to the coordinated patterns in expression changes that we observe. The application of gene expression state space trajectories to capture cell fate transitions at the genome-wide level is one approach currently used in the literature. In this paper, we analyze the gene expression dataset of Huang et al. (2005) which follows the differentiation of promyelocytes into neutrophil-like cells in the presence of inducers dimethyl sulfoxide and all-trans retinoic acid. Huang et al. (2005) build on the work of Kauffman (2004) who raised the attractor hypothesis, stating that cells exist in an expression landscape and their expression trajectories converge towards attractive sites in this landscape. We propose an alternative interpretation that explains this convergent behavior by recognizing that there are two types of processes participating in these cell fate transitions—core processes that include the specific differentiation pathways of promyelocytes to neutrophils, and transient processes that capture those pathways and responses specific to the inducer. Using functional enrichment analyses, specific biological examples and an analysis of the trajectories and their core and transient components we provide a validation of our hypothesis using the Huang et al. (2005) dataset.

Under the full model ) , ( Under the reduced model ) , ( For a single gene g, the log-likelihood function for its expression profile is given by: When the likelihood ratio statistic is used as a goodness of fit test between models, the software R actually carries out an analysis of deviance, which is an extension of the likelihood ratio test. The analysis of deviance is valid as a goodness of fit test for all classes of linear models, including generalized linear models and mixed effects models.
The deviance is defined as a linear function of the log-likelihood function: The likelihood ratio statistic is simply the difference in scaled deviances between the reduced and full models: For the Normal distribution, the scaled deviances correspond to the residual sum of squares (RSS) that are obtained by fitting the reduced and full models to the expression profile data.
Because the full model specifies more parameters, automatically red full RSS RSS ≤ and the model fit or likelihood will be higher under the full model, compared to the reduced model. What we are interested in testing then is whether the improvement in fit obtained with the full model over the reduced model is statistically significant.
We compare the ratio between the likelihood ratio statistic (and its degrees of freedom) and the The rationale is that when the full model results in a very large improvement in model fit over the reduced model, full red RSS RSS − will be very large and subsequently the ratio F will also be very large. The corresponding P-value will be very small (and after adjusting for multiple testing is also likely to be still small enough to satisfy the significance threshold), hence the gene will be placed in the transient group. Alternatively, if the improvement associated with the full model is negligible, then full red RSS RSS − will be close to zero and the ratio F will be small. The corresponding adjusted P-value will be close to 1, and the gene will be placed in the core group.

RSS RSS F
Since the probability of observing this F statistic or something more extreme under the F 4,16 distribution is large (P = 0.2241) we conclude that the reduced model is an adequate fit over the full model. This gene would be classified as a core gene (our significance threshold was set at 0.1after adjustment for multiple correction testing). Note that for the 3841 genes, after adjusting all 3841 P-values using the Benjamini-Hochberg correction method, the adjusted P-value for this gene was 0.3733.