Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Weighted Change-Point Method for Detecting Differential Gene Expression in Breast Cancer Microarray Data

  • Yao Wang,

    Affiliation Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Changchun, China

  • Guang Sun,

    Affiliation Department of Breast and Thyroid Surgery, China-Japan Union Hospital, Changchun, China

  • Zhaohua Ji,

    Affiliations Department of Communication Engineering, Jilin University, Changchun, China, College of Computer Science and Technology, Inner Mongolia Normal University, Huhhot, China

  • Chong Xing,

    Affiliations Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Changchun, China, Guanghua College of Changchun University, Changchun, China

  • Yanchun Liang

    ycliang@jlu.edu.cn

    Affiliation Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Changchun, China

Weighted Change-Point Method for Detecting Differential Gene Expression in Breast Cancer Microarray Data

  • Yao Wang, 
  • Guang Sun, 
  • Zhaohua Ji, 
  • Chong Xing, 
  • Yanchun Liang
PLOS
x

Abstract

In previous work, we proposed a method for detecting differential gene expression based on change-point of expression profile. This non-parametric change-point method gave promising result in both simulation study and public dataset experiment. However, the performance is still limited by the less sensitiveness to the right bound and the statistical significance of the statistics has not been fully explored. To overcome the insensitiveness to the right bound we modified the original method by adding a weight function to the Dn statistic. Simulation study showed that the weighted change-point statistics method is significantly better than the original NPCPS in terms of ROC, false positive rate, as well as change-point estimate. The mean absolute error of the estimated change-point by weighted change-point method was 0.03, reduced by more than 50% comparing with the original 0.06, and the mean FPR was reduced by more than 55%. Experiment on microarray Dataset I resulted in 3974 differentially expressed genes out of total 5293 genes; experiment on microarray Dataset II resulted in 9983 differentially expressed genes among total 12576 genes. In summary, the method proposed here is an effective modification to the previous method especially when only a small subset of cancer samples has DGE.

Introduction

Selecting differentially expressed genes [1], [2] is one of the most important tasks in microarray applications. Many methods were proposed to compare patterns of gene expression between cells or tissues of different kinds and under different conditions, for example, between normal and cancer cells. The goal of these methods has been to enable faster, simpler, more sensitive and systematic analyses [3]. Among these methods, t-statistics is a classical and widely-used DGE detecting methods, which works on the hypothesis that all the cancer samples are over-expressed compared with the normal samples [4]. Several other methods are also based on this hypothesis, such as empirical Bayes approach [5], mixture model approach [6], and SAM [7]. However, considering the heterogeneity of gene activation, many genes show increased expressions in disease samples, but only for a small number of those samples [8]. The study of Tomlins et al. [9], [10] shows that t-statistics has low power in this case, and they introduced cancer outlier profile analysis (COPA) method which performs better than the traditional t-statistics for cancer microarray data sets. More recently, several progresses have been made in this direction with the aim to design better statistics to account for the heterogeneous activation pattern of the cancer genes, such as non-parametric method PPST (permutation percentile separability test) [11] (Lyons-Weiler, 2004) and LRS (likelihood ratio test) [12] (Hu, 2008); percentile based methods OS (outlier sum) [13] (Tibshirani, 2007), ORT (outlier robust t-statistics) [14] (Wu, 2007) and TriORT [15] as an improvement to ORT; MOST (maximum ordered subset t-statistics) [16] (Lian, 2008) and TriMOST [17], which is an improvement to MOST.

Previously, we proposed a non-parametric change point statistics (NPCPS) method [18] based on modified Kolmogorov statistic to detect the single change-point (CP) in a data sequence [19]. This method compares directly the data distribution of normal and cancer group to detect conveniently the existence of possible change-point in the cancer group, giving an estimate of the change-point as well. Besides, as a non-parametric inferential method, NPCPS does not make assumptions about the probability distributions of the variables being assessed, and accordingly, it is not necessary to normalize the microarray data before calculating the test statistic like other parametric methods usually do. By simulation and experiment, NPCPS is effective for DGE detection and outperforms the compared methods with better ROC results in many circumstances [18]. However, the performance of this change-point based method is still limited by the less sensitiveness to the right bound and the statistical significance of the static has not been fully explored. Therefore, here we present an improved method, Weighted Change-Point Statistics (WCPS) aiming to break the limitations.

Results and Discussion

Monte Carlo simulation and ROC analysis

Monte Carlo simulation was applied to evaluate the hypothesis test used in the proposed method. For each Monte Carlo simulation, the proposed method was applied to an artificial 7000-gene dataset in normal distribution (mean = 0, standard deviation = 1) and multiple simulations were carried out with positive μ = 2, and different sample size n (normal group size n1 and cancer group size n2 equal to n/2) and DGE sample size k (0<k<n2). The false positive rate (FPR, i.e. genes with DGE were recognized as no DGE existence) and average estimate of change point (Table 1 and Table 2) were computed. Generally, for both methods, the estimate of change point and FPR enhanced together when k increased; after FPR dropped below the significance level (0.01 in this case), the estimated position converges to the actual position. However, for a given k, the proposed method outperforms the original NPCPS with closer CP estimate and lower FPR; with k increasing, the proposed method converged faster to the true change point and reached zero FPR before the original NPCPS method. For normally distributed data, between WCPS and NPCPS, the FPR is 0.09 versus 0.17; for skew-normally distributed data, the FPR was 0.08 versus 0.12. Besides, the mean absolute error (MAE) of estimated CP by WCPS was 0.03, while MAE by NPCPS was more than 0.06.

thumbnail
Table 1. CP estimate and FPR on data in normal distribution of size n1 = n2 = 25 with positive μ = 2.

https://doi.org/10.1371/journal.pone.0029860.t001

thumbnail
Table 2. CP estimate and FPR on data in normal distribution of size n1 = n2 = 50 with positive μ = 2.

https://doi.org/10.1371/journal.pone.0029860.t002

Results of more simulations with different μ and k are in Table 3.

thumbnail
Table 3. CP estimate and FPR on data in normal distribution of size n1 = n2 = 25 with different μ and k.

https://doi.org/10.1371/journal.pone.0029860.t003

The proposed method and other seven methods as comparison were then applied to two types of dataset, one in normal distribution and the other in skew-normal distribution, and each type contained several datasets with different μ, n and k. The other seven methods are NPCPS, LRS, TriMOST, TriORT, COPA, OS and T-statistics. The AUC of ROC analysis on both types of dataset is summarized in Table 4 and Table 5, and the ROC in Fig. 1 and Fig. 2, respectively.

thumbnail
Figure 1. ROC curves of the simulation on data in normal distribution.

(A) n1 = n2 = 25, μ = 2, k = 3. (B) n1 = n2 = 25, μ = 2, k = 9. (C) n1 = n2 = 50, μ = 2, k = 1. (D) n1 = n2 = 50, μ = 2, k = 4. The x-axis is FPR, and the y-axis is TPR. The significance level α = 0.01 for WCPS and NPCPS. Larger area under ROC curves indicates better sensitivity and specificity. An ROC curve along the diagonal line indicates random-guess.

https://doi.org/10.1371/journal.pone.0029860.g001

thumbnail
Figure 2. ROC curves of the simulation on data in skew-normal distribution.

(A) n1 = n2 = 25, μ = 2, k = 3. (B) n1 = n2 = 25, μ = 2, k = 5. (C) n1 = n2 = 50, μ = 2, k = 3. (D) n1 = n2 = 50, μ = 2, k = 5. The x-axis is FPR, and the y-axis is TPR. The significance level α = 0.01 for WCPS and NPCPS. Larger area under ROC curves indicates better sensitivity and specificity. An ROC curve along the diagonal line indicates random-guess.

https://doi.org/10.1371/journal.pone.0029860.g002

thumbnail
Table 4. AUC of ROC curves of the simulation on data in normal distribution.

https://doi.org/10.1371/journal.pone.0029860.t004

thumbnail
Table 5. AUC of ROC curves of the simulation on data in skew-normal distribution.

https://doi.org/10.1371/journal.pone.0029860.t005

Results show that the proposed method had larger AUC than the other methods, more significantly when k was smaller. Generally, change-point based methods, namely WCPS, NPCPS and LRS were better than the percentile-based methods in terms of ROC in the simulation study, while WCPS had the best performance; among the percentile based methods, T-statistic was very effective, while TriORT and TriMOST were better than the other two methods in terms of ROC.

The simulation result proved that by adding a weight to the original function, the proposed method becomes more sensitive to smaller k.

DGE detection in microarray data of breast-cancer

Result on Dataset I.

Dataset I contains microarray data of 49 samples from breast cancer tissues as described in the Material and Methods section. Based on the previous experiment result, among the 5293 valid and unique genes of the dataset, NPCPS (C(0.05) = 1.628) yielded a detecting result of 1598 DGE genes and 17 out of 36 top ranked genes were reported as relevant to breast cancer or other known cancers. By applying the proposed method to the same dataset, for C(0.05) = 1.628, there were 2279 DGE genes being detected (1258 over expressed genes and 1021 under expressed genes, respectively); for C(0.05) = 1.358, there were 3974 DGE genes being detected (2230 over expressed genes and 1744 under expressed genes, respectively). All the top 50 ranked genes were reported as cancer-relevant.

Among the recognized differentially expressed genes, most of them have been reported as involved directly with cancer in published papers, such as AGER, MAPK14, etc. Some genes themselves have not yet been reported, but their related genes, proteins, or behaviors have been reported as cancer-relevant, such as DGKD (EGFR and DAG related, ranked 481) [20]. Some of the genes with higher Dn statistic are suspected as participants of cancer cell lines. For example, gene CCDC130 (ranked 384) is potentially cancer relevant and currently under research in order to reveal the characterization of CCDC130 in cancer cell signaling [21]. Gene ranked in the first 500, such as AHDC1 (ranked 159), LIG3 (ranked 409), DMD (ranked 75), have not yet been reported formally as cancer-relevant. However, given the significant difference between cancer and normal group, it is reasonable to assume there is high possibility that these genes might participate in cancer development.

Some of the top 50 genes are listed in Table 6 with the cancer-relevant description [22][53]. The data distributions of two typically ranked genes are in Fig. 3 and 4. It is clear that the estimated change point could locate the actual changing point in the gene expression data. Particularly, the cancer samples that are ‘more overly expressed’ than the sample on the change point could be recognized as located in the area specified by the red dashed lines of CP.

thumbnail
Figure 3. Data distribution of Gene AGER, ranked 1st by WCPS.

(A) Empirical distribution functions of cancer and normal group, respectively, with the expression value at the change-point. (B) Expression data by samples, as well as expression value at the change-point.

https://doi.org/10.1371/journal.pone.0029860.g003

thumbnail
Figure 4. Data distribution of Gene DECR1, ranked 3487th by WCPS.

(A) Empirical distribution functions of cancer and normal group, respectively, with the expression value at the change-point. (B) Expression data by samples, as well as expression value at the change-point.

https://doi.org/10.1371/journal.pone.0029860.g004

The number of DGE samples of each gene is calculated and the corresponding histogram of detected DGE genes is displayed in Fig. 5. For example, there are 1440 non-DGE genes; 376 genes have DGE in 4 cancer samples; 164 genes have DGE in 12 cancer samples. Given the cancer group size 24, this histogram demonstrates that DGE may only exist in cancer subgroup.

thumbnail
Figure 5. Histogram: number of DGE genes by size of sample subsets containing DGE.

There are 1440 non-DGE genes; 376 genes have DGE in 4 cancer samples; 164 genes have DGE in 12 cancer samples.

https://doi.org/10.1371/journal.pone.0029860.g005

Accordingly, the number of differentially expressed genes in each cancer sample is calculated as shown in Fig. 6. For example, there are 1057 DGE genes in cancer sample 8, 1380 DGE genes in cancer sample 19, and 1682 DGE genes in cancer sample 23.

thumbnail
Figure 6. Number of significant DGE genes in each cancer sample of Dataset I.

There are 1057 DGE genes in cancer sample 8; 1380 DGE genes in cancer sample 19; 1682 DGE genes in cancer sample 23.

https://doi.org/10.1371/journal.pone.0029860.g006

Result on Dataset II.

As described in the Section of Material and Methods, Dataset II contains microarray data of 42 samples of 12576 genes, 18 samples of histologically normal (HN) epithelium from breast cancer patients, 6 samples of high-risk prophylactic mastectomy (PM) patients, and 18 samples of reduction mammoplasty patients. After applying WCPS to the dataset, when threshold is 1.358, there are 9793 over-high expressed gene and 190 over-low expressed genes, respectively; when the threshold is 1.628, the over expressed genes reduced to 867 over-high and 10 over-low, respectively. Apparently, this dataset contains majorly over-high expressed genes. Among the 50 top-ranked genes, 43 genes have been clearly reported as relevant to human cancer. Among the rest 6 genes, third-ranked gene AP000944.1 is a lincRNA and long non-coding RNA has drawn the research attention of its functional role in human cancer [54]; CENPM gene itself are not yet reported as cancer-relevant, but inappropriate expression of the centromere proteins CENP-A and CENP-H could be a major cause of chromosomal instability that has been recognized as a hallmark of human cancer [55]; 50-ranked gene HPN cooperates with MYC in the progression of adenocarcinoma in a prostate cancer mouse model [56].

NPCPS was also applied to this dataset and yielded 2564 and 337 differentially expressed genes with threshold 1.358 and 1.628, respectively.

WCPS detected much more differentially expressed genes compared with NPCPS. Moreover, the rankings between these two methods are only about 50 percent relevant. WCPS successfully recognized genes that are lower ranked or ignored by NPCPS. Fig. 7 and Fig. 8 show expression data of several such genes.

thumbnail
Figure 7. Data distribution and CP of Gene DHCR24 in Dataset II.

(A) Empirical distribution functions of cancer and normal group, respectively, with the expression value at the change-point. (B) Expression data by samples, as well as expression value at the change-point.

https://doi.org/10.1371/journal.pone.0029860.g007

thumbnail
Figure 8. Data distribution and CP of Gene PARP12 in Dataset II.

(A) Empirical distribution functions of cancer and normal group, respectively, with the expression value at the change-point. (B) Expression data by samples, as well as expression value at the change-point.

https://doi.org/10.1371/journal.pone.0029860.g008

Fig. 9 illustrates the total number of DGE genes in each HN sample. HN sample 11 and 18, two ER+ breast cancer patient samples, have more than 6000 differentially expressed genes. HN sample 1, 2, 9, three ER− breast cancer patient sample and 13, an ER+ patient sample have more than 2000 differentially expressed genes. Fig. 10 is the top ranked gene by WCPS.

thumbnail
Figure 9. Number of significant DGE genes in each HN sample of Dataset II.

HN sample 11 and 18, two ER+ breast cancer patient samples, have more than 6000 differentially expressed genes. HN sample 1, 2, 9, three ER− breast cancer patient sample and 13, an ER+ patient sample have more than 2000 differentially expressed genes.

https://doi.org/10.1371/journal.pone.0029860.g009

thumbnail
Figure 10. Data distribution and CP of Gene DAPP1 in Dataset II.

Gene expression of PM samples are not only over-expressed compared with the RM samples in case group, but also generally higher than the 18 HN samples from breast cancer patients.

https://doi.org/10.1371/journal.pone.0029860.g010

The 6 PM samples are from high-risk women and, as in the work by Graham et al., gene expression in histologically normal epithelium from breast cancer patients and from cancer-free PM patients shares a similar profile [57]. Therefore, we also tested the dataset consisted of 6 PM samples as the case group and 18 RM samples as the control group. As a result, when threshold C(0.05) = 1.358, there are 7344 over-expressed genes and 79 under-expressed genes, respectively. Fig. 10 shows one of the top-ranked genes, in which the gene expression of PM samples are not only over-expressed compared with the RM samples in case group, but also generally higher than the 18 HN samples from breast cancer patients.

Fig. 11 summarizes number of DGE genes in each PM samples. PM sample 1, 2, 4, and 6 have significantly more DGE genes compared with PM sample 3 and 5. This result corresponds to the average expression of the total 12576 genes from the 6 samples.

thumbnail
Figure 11. Number of significant DGE genes in each PM sample of Dataset II.

PM sample 1, 2, 4, and 6 have significantly more DGE genes compared with PM sample 3 and 5. This result corresponds to the average expression of the 12576 genes from the 6 samples.

https://doi.org/10.1371/journal.pone.0029860.g011

Materials and Methods

Change-point in gene expression

The method we proposed here inherited the definition of change-point as described in NPCPS [18]. Consider gene expression value as a sequence of independent variables as below:(1)

Here, X1 contains expression values of normal samples in known distribution function F1 (x), and X2 contains expression values of cancer samples. Over or under expression values in X2 would result in a change point in X. The existence of change point is evaluated by a modified Kolmogorov statistic (K-statistic), which indicates the distance between two distribution functions. Suppose F1−1 (y) is the inverse function of F1(x), which is defined as(2)where y is a variable increasing with a fixed step that is subject to user's selection. Then, the testing procedure is defined as(3)where [n*t] means round toward negative infinity. X has a change point when where is the critical value and α is the significance level, while typical values include C(0.05) = 1.358 and C(0.05) = 1.628.

Weighted Change-point Statistic

The aim of NPCPS is to find the largest Dn and check if the value exceeds the threshold, while the position of the largest Dn value indicates the most significant changes in the expression profile of a single gene. According to the ROC curves obtained from simulation study [18], NPCPS was more than 99% correct when for a single gene there are more than 9 samples that contain DGE. However, NPCPS is not very sensitive to the right bound as shown in Fig. 12. When there is only a small subset of cancer group, especially when k<5, NPCPS would have inadequate Dn values and consequently would not always report the existence of change point. Fig. 13 illustrates the descending trend of Dn value. When there is no simulated DGE added to the normally distributed data, Dn function shows a descending curve.

thumbnail
Figure 12. Estimated change point and type II error of NPCPS.

NPCPS is not very sensitive to the right bound in terms of type II error and estimated CP position. Both estimated change point and type II error of WCPS show better results compared with NPCPS.

https://doi.org/10.1371/journal.pone.0029860.g012

thumbnail
Figure 13. Dn values of each sample in a gene expression profile without DGE.

In simulated data without any DGE, Dn value has a descending trend when approaching the right bound. Weighted Dn moderately compensated the descending trend of Dn statistic.

https://doi.org/10.1371/journal.pone.0029860.g013

Therefore, in order to enhance the right-bound sensitiveness, it is reasonable to assume that by adding a proper weight function to the original function, the Dn statistic could be adequately compensated even if the change occurs in the last few data points. Apparently, the goal of the weight function is to moderately compensate the right end of the Dn statistic to avoid a rigid positive result, while keeps the Dn value on the left end as well as in the middle as much as possible, which would resemble a function similar to 1/x. Besides, as Dn is a step function, the weight function should also have the same step as Dn statistic.

The weight function as in Fig. 14 is as follows:(4)and the weighted Dn is defined as(5)

thumbnail
Figure 14. Weight function in WCPS.

The ascending curve would compensate the descending trend of original Dn statistic.

https://doi.org/10.1371/journal.pone.0029860.g014

The weighted Dn function demonstrated better response to small subset that has DGE as shown in Fig. 12. Both estimated change point and type II error of WCPS show better results compared with NPCPS. Besides, from Fig. 13 we can see that adding a weight function does not give an unreasonable rise to the right bound when there is no DGE in any samples of the simulated data.

Experiment on Breast cancer microarray dataset

Two datasets were tested in the experiment. One microarray dataset (referred to as dataset I) of breast cancer [58], the same dataset used in [18] includes 49 samples all from cancer tissues, with different status of lymph node (LN) and estrogen receptor (ER), i.e. LN+ER−/LN+ER+/LN−ER+/LN−ER−. As the negative-lymph-node breast cancer is categorized as early stage breast cancer, these 49 samples could be categorized into two types: 25 samples with negative lymph node as the normal samples and 24 samples with positive lymph node as the cancer samples, respectively. Besides, gene expression profile of 7129 genes in the samples was obtained through annotation package hu6800 [59]. Probes of genes obsolete in NCBI gene bank were deleted; for multiple probes mapping to the same gene, only the probe that corresponded to the largest Dn was kept. These two steps resulted in a total 5293 genes. This dataset was tested by all methods mentioned in simulation study. Before applied to LRS, COPA, TriMOST, TriORT, OS, and T-statistics, the gene expression values were first normalized. Before applied to WCPS, the expression values in cancer group were sorted in ascending order for each gene.

The other one (referred to as dataset II) is a 42-sample dataset obtained on platform Affymetrix Human Genome U133A Array. The samples contains 3 subsets: 18 samples of normal breast epithelia from reduction mammoplasty patients (RM sample); 18 samples of histological normal breast epithelia from 9 ER+ and 9 ER− breast cancer patients (HN samples); and 6 samples of histologically normal breast epithelium from prophylactic mastectomy patients (PM samples) [57]. 18 RM samples and 6 PM samples were considered as the control group, while the 18 HN samples were the case group in the original article. This dataset was tested by WCPS.

For method NPCPS, LRS, TriMOST, TriORT, COPA, OS and T-statistic, the genes were ranked according to the different statistic in descending order. Genes ranked in the top indicated higher degree of DGE.

For WCPS, change-point was determined by weighted Dn statistic. Genes with weighted Dn larger than were recognized as having DGE. Specially, for detecting result under  = 1.358 and based on the type of DGE (over high or over low), sample values that exceed the expression value at the change-point could be identified on single gene level. This would result in an array containing binary values of 0 or 1, where 0 indicates non-DGE sample and 1 indicates significant DGE sample. Therefore, for all genes in a dataset, these arrays could be combined to construct a matrix. Based on the matrix, the DGE genes contained in each cancer sample, or the size of DGE cancer sample subset could be calculated.

Author Contributions

Conceived and designed the experiments: YW YL GS ZJ. Performed the experiments: YW. Analyzed the data: YW. Contributed reagents/materials/analysis tools: YW ZJ. Wrote the paper: YL YW CX.

References

  1. 1. Brent R (2000) Genomic biology. Cell 100: 169–183.R. Brent2000Genomic biology.Cell100169183
  2. 2. Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, et al. (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24(3): 227–235.DT RossU. ScherfMB EisenCM PerouC. Rees2000Systematic variation in gene expression patterns in human cancer cell lines.Nat Genet243227235
  3. 3. Liang P, Pardee AB (2003) Analysing differential gene expression in cancer. Nature Reviews Cancer 3: 869–876.P. LiangAB Pardee2003Analysing differential gene expression in cancer.Nature Reviews Cancer3869876
  4. 4. Sørlie T, Tibshirani R, Parker J, Hastie T, Marron JS, et al. (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA 100: 8418–8423.T. SørlieR. TibshiraniJ. ParkerT. HastieJS Marron2003Repeated observation of breast tumor subtypes in independent gene expression data sets.Proc Natl Acad Sci USA10084188423
  5. 5. Efron B, Tibshirani R, Storey J, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96: 1151–1160.B. EfronR. TibshiraniJ. StoreyV. Tusher2001Empirical Bayes analysis of a microarray experiment.J Am Stat Assoc9611511160
  6. 6. Pan W, Lin J, Le C (2003) A mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genomics 3(3): 117–124.W. PanJ. LinC. Le2003A mixture model approach to detecting differentially expressed genes with microarray data.Funct Integr Genomics33117124
  7. 7. Storey JD, Tibshirani R (2003) SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays, in the Analysis of Gene Expression Data: Methods and Software,. In: Parmigani G, Garrett ES, Irizarry RA, Zeger SL, editors. New York: Springer. JD StoreyR. Tibshirani2003SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays, in the Analysis of Gene Expression Data: Methods and Software,G. ParmiganiES GarrettRA IrizarrySL ZegerNew YorkSpringer
  8. 8. Lian H (2007) MOST: detecting cancer differential gene expression. Biostatistics 9(3): 411–418.H. Lian2007MOST: detecting cancer differential gene expression.Biostatistics93411418
  9. 9. Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, et al. (2005) Recurrent fusion of TMPRSS2 and ETS transcription factor genes in Prostate Cancer. Science 10(310): 644–648.SA TomlinsDR RhodesS. PernerSM DhanasekaranR. Mehra2005Recurrent fusion of TMPRSS2 and ETS transcription factor genes in Prostate Cancer.Science10310644648
  10. 10. MacDonald JW, Ghosh D (2006) COPA-cancer outlier profile analysis. Bioinformatics 22: 2950–2951.JW MacDonaldD. Ghosh2006COPA-cancer outlier profile analysis.Bioinformatics2229502951
  11. 11. Lyons-Weiler J, Patel S, Becich MJ, Godfrey TE (2004) Tests for finding complex patterns of differential expression in cancers: towards individualized medicine. BMC Bioinformatics 5: 110.J. Lyons-WeilerS. PatelMJ BecichTE Godfrey2004Tests for finding complex patterns of differential expression in cancers: towards individualized medicine.BMC Bioinformatics5110
  12. 12. Hu JH (2008) Cancer outlier detection based on likelihood ratio test. Bioinformatics 24(19): 2193–2199.JH Hu2008Cancer outlier detection based on likelihood ratio test.Bioinformatics241921932199
  13. 13. Tibshirani R, Hastie T (2007) Outlier sums for differential gene expression analysis. Biostatistics 8: 2–8.R. TibshiraniT. Hastie2007Outlier sums for differential gene expression analysis.Biostatistics828
  14. 14. Wu B (2007) Cancer outlier differential gene expression detection. Biostatistics 8(3): 566–575.B. Wu2007Cancer outlier differential gene expression detection.Biostatistics83566575
  15. 15. Ji ZH, Wang Y, Wu CG, Wu XZ, Xing C, Liang YC (2010) Mean, Median and Tri-Mean Based Statistical Detection Methods for Differential Gene Expression in Microarray Data. 3rd International Congress on Image and Signal Processing, 3rd International Conference on BioMedical Engineering and Informatics (CISP'10-BMEI'10) 7: 3142–3146.ZH JiY. WangCG WuXZ WuC. XingYC Liang2010Mean, Median and Tri-Mean Based Statistical Detection Methods for Differential Gene Expression in Microarray Data.3rd International Congress on Image and Signal Processing, 3rd International Conference on BioMedical Engineering and Informatics (CISP'10-BMEI'10)731423146
  16. 16. Lian H (2008) MOST: detecting cancer differential gene expression. Biostatistics 9(3): 411–418.H. Lian2008MOST: detecting cancer differential gene expression.Biostatistics93411418
  17. 17. Ji ZH, Wu CG, Wang Y, Guan RC, Tu HW, et al. (2010) Tri-Mean Based Statistical Differential Gene Expression Detection. Int J Data Mining and Bioinformatics. ZH JiCG WuY. WangRC GuanHW Tu2010Tri-Mean Based Statistical Differential Gene Expression Detection.Int J Data Mining and BioinformaticsIn press. In press.
  18. 18. Wang Y, Wu C, Ji Z, Wang B, Liang Y (2011) Non-Parametric Change-Point Method for Differential Gene Expression Detection. PLoS ONE 6(5): e20060.Y. WangC. WuZ. JiB. WangY. Liang2011Non-Parametric Change-Point Method for Differential Gene Expression Detection.PLoS ONE65e20060
  19. 19. Tan ZP, Miao BQ (2000) Nonparametric Statistical Inference for Distribution Change Point Problems. Journal of China University of Science and Technology 6(3): 270–276.ZP TanBQ Miao2000Nonparametric Statistical Inference for Distribution Change Point Problems.Journal of China University of Science and Technology63270276
  20. 20. Griner EM, Kazanietz MG (2007) Protein kinase C and other diacylglycerol effectors in cancer. Nature Reviews Cancer 7: 281–294.EM GrinerMG Kazanietz2007Protein kinase C and other diacylglycerol effectors in cancer.Nature Reviews Cancer7281294
  21. 21. Kumar D (2011) Project information of “Functional Characterization of CCDC130 Gene”. D. Kumar2011Project information of “Functional Characterization of CCDC130 Gene”.Available:http://projectreporter.nih.gov/project_info_description.cfm?icde=0&aid=7560184 via the internet. Accessed 20 Aug 2011. Available:http://projectreporter.nih.gov/project_info_description.cfm?icde=0&aid=7560184 via the internet. Accessed 20 Aug 2011.
  22. 22. Kuniyasu H, Oue N, Wakikawa A, Shigeishi H, Matsutani N, et al. (2002) Expression of receptors for advanced glycation end-products (RAGE) is closely associated with the invasive and metastatic activity of gastric cancer. J Pathol 196(2): 163–70.H. KuniyasuN. OueA. WakikawaH. ShigeishiN. Matsutani2002Expression of receptors for advanced glycation end-products (RAGE) is closely associated with the invasive and metastatic activity of gastric cancer.J Pathol196216370
  23. 23. Field JK, Liloglou T, Warrak S, Burger M, Becker E, et al. (2005) Methylation discriminators in NSCLC identified by a microarray based approach. Int J Oncol 27(1): 105–11.JK FieldT. LiloglouS. WarrakM. BurgerE. Becker2005Methylation discriminators in NSCLC identified by a microarray based approach.Int J Oncol27110511
  24. 24. Smith PG, Wang F, Wilkinson KN, Savage KJ, Klein U, et al. (2005) The phosphodiesterase PDE4B limits cAMP-associated PI3K/AKT-dependent apoptosis in diffuse large B-cell lymphoma. Blood 105(1): 308–16.PG SmithF. WangKN WilkinsonKJ SavageU. Klein2005The phosphodiesterase PDE4B limits cAMP-associated PI3K/AKT-dependent apoptosis in diffuse large B-cell lymphoma.Blood105130816
  25. 25. Han YC, Zeng XX, Wang R, Zhao Y, Li BL, et al. (2007) Correlation of p38 mitogen-activated protein kinase signal transduction pathway to uPA expression in breast cancer. Article in Chinese. Ai Zheng 26(1): 48–53.YC HanXX ZengR. WangY. ZhaoBL Li2007Correlation of p38 mitogen-activated protein kinase signal transduction pathway to uPA expression in breast cancer. Article in Chinese.Ai Zheng2614853
  26. 26. Banine F, Bartlett C, Gunawardena R, Muchardt C, Yaniv M, et al. (2005) SWI/SNF chromatin-remodeling factors induce changes in DNA methylation to promote transcriptional activation. Cancer Res 65(9): 3542–7.F. BanineC. BartlettR. GunawardenaC. MuchardtM. Yaniv2005SWI/SNF chromatin-remodeling factors induce changes in DNA methylation to promote transcriptional activation.Cancer Res65935427
  27. 27. Arce L, Yokoyama NN, Waterman ML (2006) Diversity of LEF|[sol]|TCF action in development and disease. Oncogene 25: 7492.L. ArceNN YokoyamaML Waterman2006Diversity of LEF|[sol]|TCF action in development and disease.Oncogene257492
  28. 28. Liu Y, Gao M, Lv YM, Yang X, Ren YQ, et al. (2011) Confirmation by Exome Sequencing of the Pathogenic Role of NCSTN Mutations in Acne Inversa (Hidradenitis Suppurativa). Journal of Investigative Dermatology 131(7): 1570–1572.Y. LiuM. GaoYM LvX. YangYQ Ren2011Confirmation by Exome Sequencing of the Pathogenic Role of NCSTN Mutations in Acne Inversa (Hidradenitis Suppurativa).Journal of Investigative Dermatology131715701572
  29. 29. Chong PK, Lee H, Loh MC, Choong LY, Lin Q, et al. (2010) Upregulation of plasma C9 protein in gastric cancer patients. Proteomics 10(18): 3210–21.PK ChongH. LeeMC LohLY ChoongQ. Lin2010Upregulation of plasma C9 protein in gastric cancer patients.Proteomics1018321021
  30. 30. Pasini FS, Maistro S, Campofiorito CM, Mangone FR, Walder F, et al. (2006) SCARB2 and CSNK1 double negative mRNA expression seems to be predictive of the presence of non-compromised lymph nodes in oral squamous cell carcinoma. Proc Amer Assoc Cancer Res. FS PasiniS. MaistroCM CampofioritoFR MangoneF. Walder2006SCARB2 and CSNK1 double negative mRNA expression seems to be predictive of the presence of non-compromised lymph nodes in oral squamous cell carcinoma.Proc Amer Assoc Cancer ResVolume 47. Volume 47.
  31. 31. Davies SR, Watkins G, Douglas-Jones A, Mansel RE, Jiang WG (2008) Bone morphogenetic proteins 1 to 7 in human breast cancer, expression pattern and clinical/prognostic relevance. J Exp Ther Oncol 7(4): 327–38.SR DaviesG. WatkinsA. Douglas-JonesRE ManselWG Jiang2008Bone morphogenetic proteins 1 to 7 in human breast cancer, expression pattern and clinical/prognostic relevance.J Exp Ther Oncol7432738
  32. 32. Lu J, McKinsey TA, Nicol RL, Olson EN (2000) Signal-dependent activation of the MEF2 transcription factor by dissociation from histone deacetylases. Proc Natl Acad Sci USA 97(8): 4070–4075.J. LuTA McKinseyRL NicolEN Olson2000Signal-dependent activation of the MEF2 transcription factor by dissociation from histone deacetylases.Proc Natl Acad Sci USA97840704075
  33. 33. Glozak MA, Seto E (2007) Histone deacetylases and cancer. Oncogene 26: 5420–5432.MA GlozakE. Seto2007Histone deacetylases and cancer.Oncogene2654205432
  34. 34. Kumar S, Perlman E, Harris CA, Raffeld M, Tsokos M (2000) Myogenin is a Specific Marker for Rhabdomyosarcoma: An Immunohistochemical Study in Paraffin-Embedded Tissues. Mod Pathol 13(9): 988–993.S. KumarE. PerlmanCA HarrisM. RaffeldM. Tsokos2000Myogenin is a Specific Marker for Rhabdomyosarcoma: An Immunohistochemical Study in Paraffin-Embedded Tissues.Mod Pathol139988993
  35. 35. Kim JH, You KR, Kim IH, Cho BH, Kim CY, et al. (2004) Over-expression of the ribosomal protein L36a gene is associated with cellular proliferation in hepatocellular carcinoma. Hepatology 39(1): 129–38.JH KimKR YouIH KimBH ChoCY Kim2004Over-expression of the ribosomal protein L36a gene is associated with cellular proliferation in hepatocellular carcinoma.Hepatology39112938
  36. 36. Renier C, Vogel H, Offor O, Yao C, Wapnir I (2010) Breast cancer brain metastases express the sodium iodide symporter. Journal of Neuro-Oncology 96(3): 331–336.C. RenierH. VogelO. OfforC. YaoI. Wapnir2010Breast cancer brain metastases express the sodium iodide symporter.Journal of Neuro-Oncology963331336
  37. 37. Hayashi H, Nabeshima K, Aoki M, Hamasaki M, Enatsu S, et al. (2010) Overexpression of IQGAP1 in advanced colorectal cancer correlates with poor prognosis-critical role in tumor invasion. Int J Cancer 126(11): 2563–74.H. HayashiK. NabeshimaM. AokiM. HamasakiS. Enatsu2010Overexpression of IQGAP1 in advanced colorectal cancer correlates with poor prognosis-critical role in tumor invasion.Int J Cancer12611256374
  38. 38. Cheng CW, Yu JC, Wang HW, Huang CS, Shieh JC, et al. (2010) The clinical implications of MMP-11 and CK-20 expression in human breast cancer. Clin Chim Acta 411(3–4): 234–41.CW ChengJC YuHW WangCS HuangJC Shieh2010The clinical implications of MMP-11 and CK-20 expression in human breast cancer.Clin Chim Acta4113–423441
  39. 39. Miettinen M (1987) Synaptophysin and neurofilament proteins as markers for neuroendocrine tumors. Arch Pathol Lab Med 111(9): 813–8.M. Miettinen1987Synaptophysin and neurofilament proteins as markers for neuroendocrine tumors.Arch Pathol Lab Med11198138
  40. 40. Yang Y, Wu PP, Wu J, Shen WW, Wu YL, et al. (2008) Expression of anion exchanger 2 in human gastric cancer. Exp Oncol 30(1): 81–7.Y. YangPP WuJ. WuWW ShenYL Wu2008Expression of anion exchanger 2 in human gastric cancer.Exp Oncol301817
  41. 41. Dunn TA, Chen S, Faith DA, Hicks JL, Platz EA, et al. (2006) A Novel Role of Myosin VI in Human Prostate Cancer. Am J Pathol 169(5): 1843–1854.TA DunnS. ChenDA FaithJL HicksEA Platz2006A Novel Role of Myosin VI in Human Prostate Cancer.Am J Pathol169518431854
  42. 42. Drenth JP, Goertz J, Daha MR, vander Meer JW (1996) Immunoglobulin D enhances the release of tumor necrosis factor-alpha, and interleukin-1 beta as well as interleukin-1 receptor antagonist from human mononuclear cells. Immunology 88: 355–62.JP DrenthJ. GoertzMR DahaJW vander Meer1996Immunoglobulin D enhances the release of tumor necrosis factor-alpha, and interleukin-1 beta as well as interleukin-1 receptor antagonist from human mononuclear cells.Immunology8835562
  43. 43. Han X, Guo J, Deng W, Zhang C, Du P, et al. (2008) High-throughput cell-based screening reveals a role for ZNF131 as a repressor of ERalpha signaling. BMC Genomics 9: 476.X. HanJ. GuoW. DengC. ZhangP. Du2008High-throughput cell-based screening reveals a role for ZNF131 as a repressor of ERalpha signaling.BMC Genomics9476
  44. 44. Motadi L, Dlamini Z, Mbita Z, Bhoola K (2005) Involvement of RbBP6 gene and apoptosis in the pathogenesis of lung cancer. Proc Amer Assoc Cancer Res. L. MotadiZ. DlaminiZ. MbitaK. Bhoola2005Involvement of RbBP6 gene and apoptosis in the pathogenesis of lung cancer.Proc Amer Assoc Cancer ResVolume 46, Abstract # 3633. Volume 46, Abstract # 3633.
  45. 45. Hayashi H, Nabeshima K, Aoki M, Hamasaki M, Enatsu S, et al. (2010) Overexpression of IQGAP1 in advanced colorectal cancer correlates with poor prognosis-critical role in tumor invasion. Int J Cancer 1;126(11): 2563–74.H. HayashiK. NabeshimaM. AokiM. HamasakiS. Enatsu2010Overexpression of IQGAP1 in advanced colorectal cancer correlates with poor prognosis-critical role in tumor invasion.Int J Cancer1;12611256374
  46. 46. Zhang H, Constantine R, Vorobiev S, Chen Y, Seetharaman J, et al. (2011) UNC119 is required for G protein trafficking in sensory neurons. Nature Neuroscience. pp. 874–880.H. ZhangR. ConstantineS. VorobievY. ChenJ. Seetharaman2011UNC119 is required for G protein trafficking in sensory neurons.Nature Neuroscience874880Volume: 14, Pages. Volume: 14, Pages.
  47. 47. Entschladen F, Zänker KS, Powe DG (2011) Heterotrimeric G protein signaling in cancer cells with regard to metastasis formation. Cell Cycle 1;10(7): 1086–91.F. EntschladenKS ZänkerDG Powe2011Heterotrimeric G protein signaling in cancer cells with regard to metastasis formation.Cell Cycle1;107108691
  48. 48. Menigatti M, Cattaneo E, Sabates-Bellver J, Ilinsky VV, Went P, et al. (2009) The protein tyrosine phosphatase receptor type R gene is an early and frequent target of silencing in human colorectal tumorigenesis. Molecular Cancer 8: 124.M. MenigattiE. CattaneoJ. Sabates-BellverVV IlinskyP. Went2009The protein tyrosine phosphatase receptor type R gene is an early and frequent target of silencing in human colorectal tumorigenesis.Molecular Cancer8124
  49. 49. Wu P, Tian Y, Chen G, Wang B, Gui L, et al. (2010) Ubiquitin B: an essential mediator of trichostatin A-induced tumor-selective killing in human cancer cells. Cell Death Differ 17(1): 109–18.P. WuY. TianG. ChenB. WangL. Gui2010Ubiquitin B: an essential mediator of trichostatin A-induced tumor-selective killing in human cancer cells.Cell Death Differ17110918
  50. 50. Seiler A, Schneider M, Förster H, Roth S, Wirth EK, et al. (2008) Glutathione peroxidase 4 senses and translates oxidative stress into 12/15-lipoxygenase dependent- and AIF-mediated cell death. Cell Metab 8(3): 237–48.A. SeilerM. SchneiderH. FörsterS. RothEK Wirth2008Glutathione peroxidase 4 senses and translates oxidative stress into 12/15-lipoxygenase dependent- and AIF-mediated cell death.Cell Metab8323748
  51. 51. Ma Z, Nie Z, Luo R, Casanova JE, Ravichandran KS (2007) Regulation of Arf6 and ACAP1 signaling by the PTB-domain containing adapter protein GULP. Curr Biol 17(8): 722–727.Z. MaZ. NieR. LuoJE CasanovaKS Ravichandran2007Regulation of Arf6 and ACAP1 signaling by the PTB-domain containing adapter protein GULP.Curr Biol178722727
  52. 52. Hashimoto S, Onodera Y, Hashimoto A, Tanaka M, Hamaguchi M, et al. (2004) Requirement for Arf6 in breast cancer invasive activities. PNAS vol. 101(17): 6647–6652.S. HashimotoY. OnoderaA. HashimotoM. TanakaM. Hamaguchi2004Requirement for Arf6 in breast cancer invasive activities.PNASvol. 1011766476652
  53. 53. Zegerman P, Bannister AJ, Kouzarides T (2000) The putative tumour suppressor Fus-2 is an N-acetyltransferase. Oncogene 19(1): 161–3.P. ZegermanAJ BannisterT. Kouzarides2000The putative tumour suppressor Fus-2 is an N-acetyltransferase.Oncogene1911613
  54. 54. Gibb1 EA, Brown CJ, Lam WL (2011) The functional role of long non-coding RNA in human carcinomas. Molecular Cancer 10: 38.EA Gibb1CJ BrownWL Lam2011The functional role of long non-coding RNA in human carcinomas.Molecular Cancer1038
  55. 55. Tomonaga T, Matsushita K, Ishibashi M, Nezu M, Shimada H, Ochiai T, et al. (2005) Centromere protein H is up-regulated in primary human colorectal cancer and its overexpression Induces aneuploidy. Cancer Res 65: 4683.T. TomonagaK. MatsushitaM. IshibashiM. NezuH. ShimadaT. Ochiai2005Centromere protein H is up-regulated in primary human colorectal cancer and its overexpression Induces aneuploidy.Cancer Res654683
  56. 56. Nandana S, Ellwood-Yen K, Sawyers C, Wills M, Weidow B, et al. (2010) Hepsin cooperates with MYC in the progression of adenocarcinoma in a prostate cancer mouse model. Prostate 1; 70(6): 591–600.S. NandanaK. Ellwood-YenC. SawyersM. WillsB. Weidow2010Hepsin cooperates with MYC in the progression of adenocarcinoma in a prostate cancer mouse model.Prostate1; 706591600
  57. 57. Graham K, Morenas A, Tripathi A, King C, Kavanah M, et al. (2010) Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile. British Journal of Cancer (2010) 102: 1284–1293.K. GrahamA. MorenasA. TripathiC. KingM. Kavanah2010Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile.British Journal of Cancer(2010) 10212841293
  58. 58. West M, Blanchette C, Dressman H, Huang E, Ishida S, et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America 98: 11462–11467.M. WestC. BlanchetteH. DressmanE. HuangS. Ishida2001Predicting the clinical status of human breast cancer by using gene expression profiles.Proceedings of the National Academy of Sciences of the United States of America981146211467
  59. 59. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, et al. (2004) Bioconductor: open software development for computational biology and bioinformatics Genome Biology vol. 5: R80.RC GentlemanVJ CareyDM BatesB. BolstadM. Dettling2004Bioconductor: open software development for computational biology and bioinformaticsGenome Biologyvol. 5R80