Genes Influenced by the Non-Muscle Isoform of Myosin Light Chain Kinase Impact Human Cancer Prognosis

The multifunctional non-muscle isoform of myosin light chain kinase (nmMLCK) is critical to the rapid dynamic coordination of the cytoskeleton involved in cancer cell proliferation and migration. We identified 45 nmMLCK-influenced genes by bioinformatic filtering of genome–wide expression in wild type and nmMLCK knockout (KO) mice exposed to preclinical models of murine acute inflammatory lung injury, pathologies that are well established to include nmMLCK as an essential participant. To determine whether these nmMLCK-influenced genes were relevant to human cancers, the 45 mouse genes were matched to 38 distinct human orthologs (M38 signature) (GeneCards definition) and underwent Kaplan-Meier survival analysis in training and validation cohorts. These studies revealed that in training cohorts, the M38 signature successfully identified cancer patients with poor overall survival in breast cancer (P<0.001), colon cancer (P<0.001), glioma (P<0.001), and lung cancer (P<0.001). In validation cohorts, the M38 signature demonstrated significantly reduced overall survival for high-score patients of breast cancer (P = 0.002), colon cancer (P = 0.035), glioma (P = 0.023), and lung cancer (P = 0.023). The association between M38 risk score and overall survival was confirmed by univariate Cox proportional hazard analysis of overall survival in the both training and validation cohorts. This study, providing a novel prognostic cancer gene signature derived from a murine model of nmMLCK-associated lung inflammation, strongly supports nmMLCK-involved pathways in tumor growth and progression in human cancers and nmMLCK as an attractive candidate molecular target in both inflammatory and neoplastic processes.


Introduction
Cancer cell proliferation and migration require rapid dynamic regulation of the cytoskeleton, which is controlled by series of cytoskeleton regulatory proteins, in which myosin light chain kinase (MLCK) is a critical participant [1,2].In addition, endothelial cell paracellular extravasation and diapedesis by tumor cells is an essential step for malignant tumor metastasis and significantly influenced the activity of MLCK [3,4].Although still underestimated, MLCK started to be considered as a novel functional protein in cancer pathogenesis (initiation, proliferation, migration, and metastasis) [5,6,7].This is especially true with the more widely expressed non-muscle isoform (nmMLCK).Nonmuscle myosin light chain kinase or nmMLCK is centrally involved in driving rearrangement of the cytoskeleton, which regulates vascular endothelial barrier function, angiogenesis, endothelial cell apoptosis, and leukocytic diapedesis [8].In vivo studies implicated nmMLCK as an attractive target for ameliorating the adverse effects of dysregulated lung inflammation, including extravasation of inflammatory leukocytes [9,10], similar with the procedure of cancer cell metastasis to lung tissues [11].Deletion or silencing of nmMLCK produced greater protection against acute lung injury (ALI) and ventilator-induced lung injury (VILI) and significantly decreased alveolar and vascular permeability and lung inflammation [9].
Recently, we reported that endothelial inflammation is a key mediator of tumor growth and progression [12], supported by the fact that a molecular signature reflective of the endothelial inflammatory gene expression is predictive of clinical outcome in multiple types of human cancer [12].As nmMLCK plays a central role in regulation of endothelial cytoskeleton and endothelial inflammation, we would hypothesize that nmMLCK-related cellular signaling actively participate in the tumor progression and metastasis, although little is known regarding the effect of nmMLCK on the pathogenesis of tumor and its influence on the prognosis of human cancers.
In this present study, we would like to use nmMLCK-associated gene network (nmMLCK-deregulated gene sets) to establish a novel methodology for human cancer prognoses, by using a computational biology approach.
The purpose of this study is two-fold.The first was to identify the genes potentially regulated by nmMLCK.The second was to develop a prognostic cancer gene signature derived from the nmMLCK-associated genes.Using an experimental murine model of lung injury induced by mechanical ventilation with increased tidal volumes (the VILI model), we characterized the top differentially expressed genes between VILI-challenged wild-type (WT) mice and nmMLCK knockout (KO) mice.The mouse genes mediated by nmMLCK expression were identified.We matched the nmMLCK-mediated mouse genes to their human orthologs, which formed the basis of a multivariate molecular predictor of overall survival in several human cancers, including lung cancer, breast cancer, colon cancer, and glioma.This molecular signature predicted outcome independently of, but cooperatively with, standard clinical and pathological prognostic factors including patient age, lymph node involvement, tumor size, tumor grade, and so on.

Gene expression data
Microarray data of lung RNA from WT control, VILI-exposed WT, and VILI-exposed nmMLCK KO mice were obtained from NCBI GEO database (GSE14525) [9].We used this dataset to filter out the nmMLCK-mediated mouse genes.
The gene expression datasets representing human cancers were downloaded from publicly available repositories.These datasets were chosen based on the availability of clinical survival data and the large size of samples.For each tumor type, training and validation cohorts were constructed.The dataset for breast cancer (n = 295) was available from http://bioinformatics.nki.nl/data.php (Netherlands Cancer Institute, Computational Cancer Biology Data Repository) [13].The breast cancer patients were randomly separated into two parts (1/2 for training and 1/2 for validation).For colon cancer, we downloaded two datasets from a single study [14].One dataset was used as training cohort (n = 177; GSE17536) and the other one was used for validation (n = 55; GSE17537).For glioma, distinct datasets from two different studies were obtained for training (n = 77; GSE4271) [15] and validation (n = 50; http://www.broadinstitute.org/cgi-bin/ca?ncer/datasets.cgi)[16].Lastly, we obtained three datasets (n = 359) for lung cancer which were available from a single study [17].Two datasets were combined as training cohort (n = 161) and the other one was used as validation cohort (n = 178).The CEL files for the study are available at https://caarraydb.nci.nih.gov/caarray/publicExperimentDetailAction.do?expId = 101594523614 1280.

Statistical analysis
SAM (Significance Analysis of Microarrays) [18], implemented in the samr library of the R Statistical Package [19], was used to compare log 2 -transformed gene expression levels between WT control, VILI-exposed WT, and VILI-exposed nmMLCK KO mice.False discovery rate (FDR) was controlled using the q-value method [20].Transcripts with a fold-change greater than 2 and FDR less than 10% were deemed differentially expressed.We searched for any enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) [21] physiological pathways among the differential genes relative to the final analysis set using the NIH/ DAVID [22,23].Hierarchical clustering via complete linkage rule with Euclidean distance metric was applied to visualize gene expression differences, using gplots library of R Statistical Package [19].
For each cancer training/validation dataset, we normalized the gene expression level into the scale of [21,1] by POE (probability of expression) algorithm [24,25] implemented in the metaArray library of the R Statistical Package [19].Based on the gene expression and clinical outcome data from the training dataset, we can assign a Wald statistic generated by univariate Cox proportional-hazard regression to each gene as a weight.A risk score was calculated for each patient using a linear combination of weighted gene expression as below: Here, s is the risk score of the patient; n is the number of differentially expressed genes; w i denotes the weight of gene i; e i denotes the expression level of gene i; and m i and t i are the mean and standard deviation of the gene expression values for gene i across all samples, respectively.Patients were then divided into high-score and low-score groups with the median of the risk score as the threshold value.A high score indicated a poor outcome.The weight of each gene was fixed, based on the training groups, and then tested in the validation groups [12].Overall survival was analyzed by the Kaplan-Meier method.Differences in survival were tested for statistical significance by the log-rank test.P-values of less than 0.05 were considered to indicate statistical significance.The survival library of the R Statistical Package [19] was used to conduct survival analysis on the risk score.

nmMLCK-mediated mouse genes
At the specified significance level (fold-change .2 and FDR,10%), 365 genes were found be differentially expressed between VILI-exposed WT and nmMLCK KO mice, among which 117 genes were up-regulated while 248 genes were downregulated in nmMLCK KO mice (Table S1).Several pathways were significantly enriched among these differentially expressed genes (P,0.05), such as vascular smooth muscle contraction, chemokine signaling pathway, calcium signaling pathway, ErbB signaling pathway, focal adhesion and so on (Figure 1A).
To filter out the top genes potentially associated with nmMLCK, we also compared the gene expression between WT control and VILI-exposed WT mice.981 genes were found to be differentially expressed (fold-change .2 and FDR,10%) between these two groups (Table S2).Among these genes, we retained the genes with opposite direction of differential expression comparing Table S1 and S2.In other words, only the genes with attenuated VILI-mediated gene expression in nmMLCK KO mice were considered here.This step yielded 53 mouse genes.Lastly, we excluded the genes differentially expressed between WT control and VILI-exposed nmMLCK KO mice.In total, we retained 45 mouse genes for further study.Pathway analysis demonstrated a significant linkage of this gene set to ErbB signaling pathway, Glioma, Circadian rhythm (Figure 1B), which suggests that nmMLCK signaling contributes to the development and malignancy of tumors.Expression heatmap indicates that the expression profile of the 45 mouse genes were recovered at approximately normal levels of expression in nmMLCK KO mice exposed to VILI (Figure 1C).We deemed these 45 mouse genes as nmMLCK-mediated gene set (Table 1).

Prognostic molecular signature
nmMLCK is potentially involved in the pathogenesis of cancers [3,4,26].To determine whether nmMLCK-mediated genes derived from nmMLCK KO mouse model were relevant to human cancers, we matched the 45 nmMLCK-mediated mouse genes to 38 distinct human orthologs (M38 signature) according to the definition of GeneCards [27,28] (Table 2).We hypothesized that the M38 signature would be predictive of tumor outcome in cancer patients.
We constructed a risk scoring system that combined gene expression of M38 with risk for death in the training dataset.High-score patients were defined as those having a risk score greater than or equal to the group median score.In independent validation cohorts, we tested the ability of the risk score to stratify patients into prognostic groups.We performed Kaplan-Meier survival analysis comparing the high-score and low-score groups and determined statistical significance by log-rank tests.As expected, the M38 signature was able to identify patients with poor overall survival in breast cancer (P,0.001),colon cancer (P,0.001),glioma (P,0.001), and lung cancer (P,0.001) in the training cohorts (Figure S1).In the validation cohorts, Kaplan-Meier survival analysis comparing patient groups demonstrated a significantly reduced overall survival for high-score patients of breast cancer (P = 0.002), colon cancer (P = 0.035), glioma (P = 0.023), and lung cancer (P = 0.023) (Figure 2).The association between M38 risk score and overall survival was also confirmed by univariate Cox proportional hazard analysis of overall survival in both training and validation cohorts (Table 3).In the validation cohorts, high-score patients had an increased risk for death of 3.10-fold in breast cancer, 2.96-fold in colon cancer, 2.23-fold in glioma, and 1.60-fold in lung cancer.

Independence of M38 from other clinicopathologic factors
We investigated the performance of the M38 signature in comparison with clinicopathologic variables associated with prognosis in human cancers.A multivariate Cox analysis of overall survival indicated that M38 status remained a significant covariate in relation to the standard clinicopathologic factors in the four types of human cancers, such as patient age, lymph node status, tumor size, tumor grade, and so on (Table 4).We next stratified the patients according to the factors significant on multivariate analysis.
For breast cancer, patients were stratified by age, tumor grade, and estrogen receptor (ER) status, respectively.For patients with age ,40 and $40, the high-score ones had a significantly increased risk for death of 6.36-fold (P,0.001) and 2.80-fold (P = 0.001), respectively.For patients with tumor grade ,2 and $2, the high-score patients had an increased risk for death of 2.63fold (P = 0.410) and 2.65-fold (P,0.001),respectively.For patients with negative and positive ER status, the high-score patients had a significantly increased risk for death of 2.25-fold (P = 0.025) and 4.00-fold (P,0.001),respectively.
For colon cancer, patients were grouped by age and clinical stage, respectively.For patients with age ,60 and $60, the high-  score ones had a significantly increased risk for death of 2.29-fold (P = 0.025) and 2.88-fold (P,0.001),respectively.For patients with stage ,3and $3, the high-score ones had a significantly increased risk for death of 3.50-fold (P = 0.015) and 1.71-fold (P = 0.024), respectively.
Patients with glioma were grouped by age.For patients with age ,45 and $45, the high-score ones had a significantly increased risk for death of 3.46-fold (P = 0.004) and 2.00-fold (P = 0.045), respectively.
Lung cancer patients were stratified by age, lymph node status, and tumor size, respectively.For patients with age ,65 and $65, the high-score ones had a significantly increased risk for death of 2.35-fold (P,0.001) and 1.97-fold (P,0.001),respectively.For patients with and without lymph node involvement, the high-score patients had a significantly increased risk for death of 1.62-fold (P = 0.012) and 1.73-fold (P = 0.014), respectively.For patients with tumor size ,T3 and $T3, the high-score patients had an increased risk for death of 2.20-fold (P,0.001) and 1.63-fold (P = 0.180), respectively.
Kaplan-Meier survival analysis also demonstrated a significantly reduced overall survival for high-score patients in each subset grouped by each clinicopathologic factor (Figure S2-S5).Taken together, these results suggest that the expression of M38 signature is associated with clinical outcomes and is an independent prognostic factor.

Discussion
This current study confirms an internal link between nmMLCK-mediate signaling and clinical cancer mortality with novel evidence: first, we defined a group of nmMLCK-driven genes with a murine model of lung inflammatory injury under which the effects of nmMLCK are amplified.Second.This nmMLCK-centralized molecular signature reflective of lung inflammatory gene expression is highly predictive of poor clinical outcome in four types of human cancer.
MLCK (gene code: MYLK) is a Ca 2+ /calmodulin-dependent kinase that phosphorylates myosin light chains (MLCs) to promote myosin interaction with cytoskeletal actin filaments [29].It plays a key role in cytoskeleton rearrangement and contractile activities of both non-muscle tissue [30] and smooth muscle tissues [31].The non-muscle isoform, nmMLCK, has been demonstrated to be a key participant in the inflammatory response based its ability to regulate vascular endothelium integrity and leukocyte influx from circulation into the lung broncoalveolar space [9].Similar to pathogenesis in endothelial cells in ALI, cancer cell proliferation and migration require rapid dynamic regulation of the cytoskeleton, which is controlled by a group of cytoskeleton regulatory proteins, in which nmMLCK serves as a critical and central participant [1,2].In addition, trans-cellular extravasation, the essential step for malignant tumor metastasis, is well controlled by the activity of MLCK [3,4].Although still underestimated, MLCK started to be considered as a novel functional protein in cancer pathogenesis (initiation, proliferation, migration, and metastasis) [5,6,7].This is especially true with the more widely expressed nonmuscle isoform (nmMLCK).
Although little is known regarding the mechanisms of nmMLCK in the pathogenesis of tumor and its influence on the prognosis of human cancers, inflammatory response that regulated by nmMLCK in lungs is playing an active role in tumorogenesis and many successful therapies targeting chronic inflammation directly alter endothelial gene expression [32].Murine VILI model amplifies the nmMLCK-mediated gene expression and serves as a satisfactory platform to dissect nmMLCK molecular signature in lung inflammatory injury.
Compared to a previous study [12], we used a non-conventional inflammation marker nmMLCK (compared to TNFa), which is more related to endothelial inflammation, as nmMLCK is selective expressed in non-muscle tissues such as endothelium [29].Combined together, these two studies further verify the key role of ''endothelial-specific'' inflammation in cancer progression.Since nmMLCK is also expressed in other tissue types including epithelium and inflammatory leukocytes (same as TNFa), amplified molecular signature of nmMLCK by lung inflammation (M38 signature) might also involve other type of tissues in lungs, i.e., epithelium and infiltrated neutrophils.The potential contribution of M38 signature in pathogenesis in these tissues to cancer prognosis might also be important.
Our next study will focus on validation of these candidate genes filtered out in both nmMLCK and TNFa studies and generate a more accurate cancer prognosis platform with a refined gene set, which will lead to the development of cancer risk prediction/ prognosis gene array in clinical trials.MYLK is not in the M38 gene list, although the 38 genes were based on nmMLCK knockout mice.The possible complex reason might be that nmMLCK (210 Kd) is an isotype of MYLK gene product, while MYLK also produces smMLCK (108 Kd), which comprises of .80% of the MYLK gene products in lung.nmMLCK knockout does not interfere with smMLCK expression, but the microarray platform does not differentiate nmMLCK from smMLCK.This fact drives successful filtration of the 38 nmMLCK-mediated genes, but MYLK was not able to survive in the M38 gene list.To address the effect of MYLK in cancer survival prediction, we re-analyzed our datasets with the 39 genes (M38 genes plus MYLK), but no obvious improvement was found (Table S3).Nevertheless, several recent studies indicate that nmMLCK expression is indeed changed in human cancers, such as colorectal cancer [33] and prostate cancer [34].
We used a scoring system to assign a M38-based risk score to each patients.This scoring system can also be directly applied to other published cancer gene signatures.The comparison between cancer gene signatures can be simply conducted by comparing the prognostic power of the risk scores of individual gene signatures.In this study, we used the median of M38 score to divide the patents into two parts (high-score and low-score patients) to do categorized analyses (such as Kaplan-Meier analysis and log-rank test).Clinically, we can use zero as an absolute cutoff to stratify the patients into high-risk and low-risk groups, because the median of M38 score is approximately equal to zero in each dataset.
This study provides the first prognostic cancer gene signature derived from a murine model of nmMLCK-associated lung inflammation.Activation of nmMLCK-involved pathways contributes to tumor growth and progression in human cancers.These findings support the notion that nmMLCK is an attractive candidate molecular target in lung diseases.(PDF)

Supporting Information
Table S1 Differentially expressed genes between VILIexposed WT and VILI-exposed nmMLCK KO mice.(PDF) Table S2 Differentially expressed genes between WT control and VILI-exposed WT mice.(PDF) Table S3 Univariate Cox proportional hazards regression of overall survival against M38+MYLK signature status. (PDF)

Figure 2 .
Figure 2. Expression of M38 signature predicts poor clinical outcome in multiple human cancers.Kaplan-Meier survival curves for patient groups identified by M38 risk score.Red curves are for the high-score patients while blue curves are for the low-score patients.High-score patients are defined as those having a M38 risk score greater than or equal to the group median score.P-values indicate significant differences in overall survival as measured by log-rank tests.doi:10.1371/journal.pone.0094325.g002

Figure
Figure S1 Application of the M38 signature to training datasets representing four human cancers.Kaplan-Meier survival curves for patient groups identified by M38 risk score.Red curves are for the high-score patients while blue curves are for the low-score patients.High-score patients are defined as those having a M38 risk score greater than or equal to the group median score.P-values indicate significant differences in overall survival as measured by log-rank tests.(PDF) Figure S2 M38 signature adds prognostic value to clinicopathologic factors associated with survival in human breast cancer.Kaplan-Meier survival curves of patient cohorts grouped by (A) age, (B) tumor grade, or (C) ER status.Red curves are for the high-score patients while blue curves are for the low-score patients.High-score patients are defined as those having a M38 risk score greater than or equal to the group median score.P-values indicate significant differences in overall survival as measured by log-rank tests.(PDF) Figure S3 M38 signature adds prognostic value to clinicopathologic factors associated with survival in human colon cancer.Kaplan-Meier survival curves of patient cohorts grouped by (A) age or (B) clinical stage.Red curves are for the high-score patients while blue curves are for the low-score patients.High-score patients are defined as those having a M38 risk score greater than or equal to the group median score.Pvalues indicate significant differences in overall survival as measured by log-rank tests.(PDF)
a FC: fold change, which is calculated by dividing the expression in VILI-exposed WT mice by the expression in WT control mice.b FC: fold change, which is calculated by dividing the expression in VILI-exposed nmMLCK KO mice by the expression in WT VILI-exposed mice.doi:10.1371/journal.pone.0094325.t001

Table 4 .
Multivariate Cox proportional hazards regression of overall survival.Figure S4 M38 signature adds prognostic value to clinicopathologic factors associated with survival in human glioma.Kaplan-Meier survival curves of patient cohorts grouped by age.Red curves are for the high-score patients while blue curves are for the low-score patients.High-score patients are defined as those having a M38 risk score greater than or equal to the group median score.P-values indicate significant differences in overall survival as measured by log-rank tests.(PDF) Figure S5 M38 signature adds prognostic value to clinicopathologic factors associated with survival in human lung cancer.Kaplan-Meier survival curves of patient cohorts grouped by (A) age, (B) lymph node status, or (C) tumor size.Red curves are for the high-score patients while blue curves are for the low-score patients.High-score patients are defined as those having a M38 risk score greater than or equal to the group median score.P-values indicate significant differences in overall survival as measured by log-rank tests.