A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms
Figure 1
Higher-order generalized singular value decomposition (HO GSVD).
In this raster display of Equation (1) with overexpression (red), no change in expression (black), and underexpression (green) centered at gene- and array-invariant expression, the S. pombe, S. cerevisiae and human global mRNA expression datasets are tabulated as organism-specific genes17-arrays matrices
,
and
. The underlying assumption is that there exists a one-to-one mapping among the 17 columns of the three matrices but not necessarily among their rows. These matrices are transformed to the reduced diagonalized matrices
,
and
, each of 17-“arraylets,” i.e., left basis vectors
17-“genelets,” i.e., right basis vectors, by using the organism-specific genes
17-arraylets transformation matrices
,
and
and the shared 17-genelets
17-arrays transformation matrix
. We prove that with our particular
of Equations (2)–(4), this decomposition extends to higher orders all of the mathematical properties of the GSVD except for complete column-wise orthogonality of the arraylets, i.e., left basis vectors that form the matrices
,
and
. We therefore mathematically define, in analogy with the GSVD, the “common HO GSVD subspace” of the
matrices to be the subspace spanned by the genelets, i.e., right basis vectors
that correspond to higher-order generalized singular values that are equal,
, where, as we prove, the corresponding arraylets, i.e., the left basis vectors
,
and
, are orthonormal to all other arraylets in
,
and
. We show that like the GSVD for two organisms [7], the HO GSVD provides a sequence-independent comparative mathematical framework for datasets from more than two organisms, where the mathematical variables and operations represent biological reality: Genelets of common significance in the multiple datasets, and the corresponding arraylets, represent cell-cycle checkpoints or transitions from one phase to the next, common to S. pombe, S. cerevisiae and human. Simultaneous reconstruction and classification of the three datasets in the common subspace that these patterns span outline the biological similarity in the regulation of their cell-cycle programs. Notably, genes of significantly different cell-cycle peak times [19] but highly conserved sequences [20], [21] are correctly classified.