Figures
Abstract
For studying cancer and genetic diseases, the issue of identifying high correlation genes from high-dimensional data is an important problem. It is a great challenge to select relevant biomarkers from gene expression data that contains some important correlation structures, and some of the genes can be divided into different groups with a common biological function, chromosomal location or regulation. In this paper, we propose a penalized accelerated failure time model CHR-DE using a non-convex regularization (local search) with differential evolution (global search) in a wrapper-embedded memetic framework. The complex harmonic regularization (CHR) can approximate to the combination and ℓq (1 ≤ q < 2) for selecting biomarkers in group. And differential evolution (DE) is utilized to globally optimize the CHR’s hyperparameters, which make CHR-DE achieve strong capability of selecting groups of genes in high-dimensional biological data. We also developed an efficient path seeking algorithm to optimize this penalized model. The proposed method is evaluated on synthetic and three gene expression datasets: breast cancer, hepatocellular carcinoma and colorectal cancer. The experimental results demonstrate that CHR-DE is a more effective tool for feature selection and learning prediction.
Citation: Wang S, Shen H-W, Chai H, Liang Y (2019) Complex harmonic regularization with differential evolution in a memetic framework for biomarker selection. PLoS ONE 14(2): e0210786. https://doi.org/10.1371/journal.pone.0210786
Editor: Suzannah Rutherford, Fred Hutchinson Cancer Research Center, UNITED STATES
Received: March 8, 2018; Accepted: January 2, 2019; Published: February 14, 2019
Copyright: © 2019 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: We demonstrate our proposed methods by analysing microarray expression data from NCBI’s gene expression omnibus (GEO) with the accession number as follows. (1) breast cancer (GSE22210) https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22210 (2) hepatocellular carcinoma (HCC, GSE10141) https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10141 (3)colorectal cancer (CRC, GSE103479) https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE103479.
Funding: This work was supported by the Macau Science and Technology Develop Funds (Grant No. 003/2016/AFJ) of Macao SAR of China and China NSFC project under contract 61661166011 to YL.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Feature selection is a great step forward for selecting biomarkers in biological data with high dimension and small sample. Among various kinds of feature selection methods, the regularization methods use different penalty functions embedded in the learning procedure into a single process and has lower risk to over-fitting. The well known penalty is the least absolute shrinkage and selection operator (Lasso, ℓ1-norm) [1], which is performing continuous shrinkage and feature selection at the same time. Other ℓ1-norm type regularization methods typically include smoothly clipped absolute deviation (SCAD) [2], group lasso [3], minimax concave penalty (MCP) [4], etc. Besides, Xu et al [5] has proved that when , there is no significant difference in the performance of ℓp-norm, but the computational complexity to solve the ℓ1/2 regularization is much lower than that of the ℓ0-norm; while
, the solutions of the ℓp regularization is more sparse with the decline in p. Under this theory, Chu et al [6] proposed a naïve harmonic regularization that can approximate
penalties.
One limitation of these ℓ1-norm type regularizations is that when the data set contains strong correlations among the predictors, it tends to select only one feature from the group and does not even care which one is selected, but these groups may be gene pathways in gene expression data. In theory, a strictly convex penalty function provides a sufficient condition for grouping effect of variables and ℓq-norm (q > 1) penalty guarantees strict convexity [7]. Zou and Hastie [8] proposed the Elastic net that mixes the ℓ1 and ℓ2 penalties. After that, some regularization methods without prior knowledge that combined ℓ2-norm for selecting groups of variables are SCAD-ℓ2 [7], ℓ1/2 + ℓ2 [9], and so on. While, there are also some regularization methods with prior knowledge, such as group lasso [3] that has been used for multivariate analysis of variance model, where each factor may have several levels and can be expressed by a group of dummy variables. In this article, we employ a complex harmonic regularization (CHR) [10] that approximates to the combination and ℓq (1 ≤ q < 2) to select the key factors in group among all features. This approach avoided determining the value of p or q in advance, i.e., we would not need to assume the probability distribution of the data, before evaluating the grouping effect and spare by the existing regularization methods.
However, the hyperparameters of CHR are sensitive to the resolution, and the hyperparameter tuning is typically done by expert analysis, evolutionary algorithms, bayesian optimization and grid search [11]. Jaderberg et al [12] efficiently set the hyperparameters of neural networks based on the genetic algorithm (GA). Liu et al [13] proposed a hybrid genetic algorithm which combines genetic algorithm with embedded ℓ1/2 + ℓ2 regularization together. Such evolutionary algorithms are suitable to deal with tuning hyperparameters of these multimodal penalty functions. GA [14] is the most widely used one in the literature. However, GA is much slower convergence to optimum for high dimensional problem. Consequently, it cannot handle the learning model with more hyperparameters. A popular swarm-intelligence-based algorithm is the particle swarm optimization (PSO) algorithm [15] which is well adapted to the optimization of nonlinear functions in multidimensional space. Differential evolution (DE) [16] has been particularly proposed for continuous search spaces and is very simple to implement. Vesterstrom and Thomsen [17] have evaluated the performance of GA, DE and PSO regarding their general applicability as numerical optimization techniques. Then, they concluded that DE is less sensitive to parameter changes than other metaheuristic algorithms. Therefore, the DE can rightfully be regarded as an excellent choice to hyperparameter optimization.
Memetic algorithm [18] is now widely used as a synergy of evolutionary or any population-based approach with separate individual learning or local improvement procedures for problem search. Evolution strategy (ES) is the first and oldest evolutionary algorithm, and it is based on the adaptation and evolution. Covariance matrix adaptation evolution strategies (CMA-ES) [19] is one of the most recent and powerful versions of memetic algorithm that combined evolution strategies with local information. The gene-pool optimal mixing evolutionary algorithm (GOMEA) is made for local search applying a strong mathematical background on the generation of the solutions, but it is considered to be a EA for discrete optimization problems [20]. Recently, Bouter et al. [21] proposed the real-valued GOMEA (RV-GOMEA) to cover the real-valued search space. Besides, memetic framework [22] models memetic algorithms as a process involving feature selection and learning procedure. In this paper, we present a wrapper-embedded memetic framework that utilizes DE to globally optimize the hyperparameters of non-convex regularization CHR that is a local search to select biomarkers in group.
The workflow of our proposed algorithm is shown in Fig 1. Microarray gene expression data for one certain cancer are collected, processed into a matrix file that contains the genes (rows) and tissue samples (columns). After setting the CHR’s hyperparameters in DE procedure, CHR starts the learning procedures, and then gives the fitness values feedback to update its hyperparameters. With a fully trained model, we can get some groups of genes with non-zero coefficients, which may be the valid biomarkers for this cancer.
Microarray gene expression data for one certain cancer are collected, processed into a matrix file that contains the genes (rows) and tissue samples (columns). In order to identify tumor subclasses that are both biologically meaningful and clinically relevant, we apply the differential evolution (DE) to fine tuning the hyperparameters of the complex harmonic regularization (CHR). After the operations of DE procedure, such as differential mutation, crossover, adaptive local search and selection, this CHR can be used in the learning procedures, and then give the fitness values feedback to update its hyperparameters. With a fully trained model, we can get some groups of genes with non-zero coefficients, which may be the valid biomarkers for this cancer.
The remainder of this paper is organized as follows: the CHR method for survival data in accelerated failure time (AFT) model is presented in Section 2, the implement of tuning CHR’s hyperparameters is introduced in Section 3, the experimental results and discussions are illustrated in Section 4, a concluding remark is finally made in Section 5.
2 Complex harmonic penalized accelerated failure time model
2.1 Accelerated failure time model
Suppose X denotes the h × k data matrix whose rows are Xi = (xi1, xi2, …, xik), 1 ≤ i ≤ h, T denotes the sample vector of a lifetime or time to certain event of interest (τ1, τ2, …, τh)T. Throughout this article we consider failure times (or survival times) that are right censored, survival time τi = min(ti, ci), where ti is the true survival time, ci is the time to the first censoring event (e.g., study conclusion, date of final follow up) for each subject i. Our survival data consist of independent observations for h individuals , where δ is the censoring indicator, if δi = 0, it represents the right censoring time and δi = 1 means the completed time.
The accelerated failure time (AFT) model is treated as a linear regression between the survival time τi and the covariates Xi: G(τi) = β0 + xi βT + εi, i = 1, 2, …, h, where , β0 is the intercept,
is the regression coefficient, and εi are h independent random errors with a normal distribution function. Because of the censoring time in the datasets, the standard least squares approach is not allowed to directly compute the regression parameters of the covariates in AFT model.
In order to simplify the method, we use the mean imputation method [23] to estimate the right censored data in the least squares criterion. The estimated value G(τi) of the censoring survival time τi is given by:
(1)
where t(⋅) are distinct censored lifetimes in an ascending sort order, r is the number of individuals at risk of failing just before time t(i),
is the Kaplan-Meier estimator [24] of the survival function, and
is the step of
at time t(r). Therefore, the least squares approach of AFT model is to minimize the loss function L(β) for the Gaussian family:
(2)
where the first column of X is all ones, and each censored yi is replaced with the imputed value G(τi).
2.2 Path seeking algorithm for complex harmonic regularization penalty
Regularization is a way to avoid over-fitting in AFT model and the common form of regularization for a control parameter λ (λ > 0) is:
(3)
where
are the estimated coefficients, L(β) is a loss function and P(β) represents the regularization term.
In fact, the survival data have different probability distributions of grouping effect and sparse. In theory, a strictly convex penalty function, such as ℓq (1 < q < 2), provides a sufficient condition for the grouping effect. On the contrary, ℓp (0 < p < 1) penalty can provide different sparse evaluation with different p value. The limitation of the existing regularization methods is that a fixed p (0 < p < 1) value ℓp-norm with ℓ2-norm is used to evaluate the grouping effect and spares in variable selection, thus they often have assumptions about the probability distribution of the data. Upon our previous work naïve harmonic regularization that can approximate penalties [6], we designed the CHR penalty that can approximate the combination of the
and ℓq (1 ≤ q < 2) penalties [10]. The CHR penalty can be normally expressed as:
(4)
where 0 < a, b < 1; λ1, λ2 ≥ 0;
Furthermore, comparing with the fixed p and q, the CHR penalty can suggest a proper value for p and q in given datasets, and the CHR penalty can be plotted as Fig 2. When a is close to 0, m(β) ≈ |β| (ℓ1-norm, see Fig 2(c)). When a is close to 1, (ℓ1/2-norm, see Fig 2(b)). When b is close to 0, n(β) ≈ |β|2 (ℓ2-norm, see Fig 2(e)). When b is close to 1, n(β) = |β| (see Fig 2(f)), that is same with a closing to 0.
(a) the curves represent m(⋅) at different parameter a values; (b) the solid curve represents m(⋅) at the parameter a = 0.99, and the dashed curve is the ℓ1/2 regularization; (c) the solid curve represents m(⋅) at the parameter a = 0.01, and the dashed curve is the ℓ1 regularization; (d) the curves represent n(⋅) at different parameter b values; (e) the solid curve represents n(⋅) at the parameter b = 0.01, and the dashed curve is the ℓ2 regularization; (f) the solid curve represents n(⋅) at the parameter b = 0.99, and the dashed curve is the ℓ1 regularization.
Theorem 1. m(⋅) and n(⋅) approximate to the combination of and ℓq (1 ≤ q < 2) regularizations with adjustable p and q to evaluate the grouping effect and sparse of data, i.e.,
There are the inductions of the first two equations. The inductions of other two equations are similar to these and need not be explained here.
Let in Eq (4), then the common form of CHR penalty can be re-expressed as:
(5)
Therefore, we can use the path seeking algorithm [25] in linear model to sequentially construct a path directly in parameter space that closely approximates that for CHR penalty, without having repeatedly solve numerical optimization problem.
Let ν measure length along the path and Δν > 0 be a small increment. Here, we need to note that the size of the step Δν can be obtained by
(6)
Define
(7)
(8)
(9)
where λj(ν) is the ratio of these two gradients φj(ν) for loss function Eq (2) and ϕj(ν) for the penalty function with respect to |βj|. This path seeking scheme can accelerate solving the CHR penalty. The details of the implementation of CHR penalty are outlined in Algorithm 1.
Algorithm 1 Implementation of CHR penalty
1: Initialize:
2: repeat
3: Compute
4:
5: if S = empty then
6: j* = arg maxj |λj(ν)|
7: else
8: j* = arg maxj∈S |λj(ν)|
9: end if
10:
11:
12: ν ← ν + Δν
13: untill λ(ν) = 0
After initializing the path, the vector λ(ν) is computed via Eqs (7)–(9) at each step. Then, those non zero coefficients which have a sign opposite to that of their corresponding λj(ν) are identified. When the set S is empty, the coefficient corresponding to the largest component of λ(ν), in absolute value is selected at line 6. And when there are one or more elements in the set S, the coefficient with corresponding largest |λj(ν)| within this subset is instead selected. The selected coefficient
is then incriminated by a small amount in the direction of the sign of its correspond λj*(ν) with all other coefficient remaining unchanged, producing the solution for the next path point ν + Δν. Iterations continue until all components of λ(ν) are zero.
Although the complex harmonic penalized AFT model can adapt for different data distributions, this model has three hyperparameters a, b, γ which are sensitive to the resolution. The more suitable way thereby is optimized by the evolutionary algorithms to make these regularized hyperparameters more precise and efficient.
3 Complex harmonic regularization in a memetic framework
3.1 A wrapper-embedded memetic framework
Memetic framework [22] models memetic algorithms (MAs) as a process involving feature selection and learning procedure. The term of MAs, which combine evolutionary algorithms (EAs) with local search (LS) [26], have recently received much attention from the feature selection problems. These methods are inspired by Darwin’s principles of natural evolution and Dawkins defined memes, which unlike genes, can adapt themselves [27].
In most memetic-based feature selection approaches, an EA is used for wapper feature selection and a LS algorithm is used for filter feature selection. Zhu et al [28] applied genetic algorithm for wrapper feature selection and used Markov blanket approach as a LS for filter feature selection. Noman and Iba [29] incorporated a crossover-based LS with adaptive length in DE resulted into a DE-variant, where the length of the LS algorithm can be adjusted adaptively using a hill climbing heuristic. However, such memetic-based approaches have the potential limitation that filter evaluation measures may eliminate potentially useful features regardless of their performance in the wrapper approaches. In addition, the wrapper approaches usually involve a large number of assessments, and each assessment usually takes a considerable amount of time, especially when the numbers of features and instances are large. The second limitation of the existing memetic-based feature selection methods is that they are primarily concerned with the relatively small numbers of features and instances.
Focusing on these limitations above, regularization method can adapting relationships between data by designing different penalty functions with original, grouping effect or net effect. What’s more, regularization methods evaluate features and build model at one stage. Therefore, we embed CHR penalty into a DE-variant for improving the selection ability under the global optimization of the non-convex regularization.
3.2 Implementation of complex harmonic regularization with differential evolution (CHR-DE) algorithm
Our proposed wrapper-embedded feature selection approach (CHR-DE) in memetic framework includes population-initialized, differential mutation, crossover, adaptive local search and selection operations. The first step of the CHR-DE approach is that the DE population is randomly initialized with each chromosome encoding the penalized hyperparameters (intron) and the coefficients of each gene in the AFT model (exon). Subsequently, the CHR approach (local search) is performed on the exon part under the fixed intron part, to reach a local optimal solution or to improve the fitness of individuals in the search population. DE operations are performed on the intron parts of the chromosomes, and the selection operator generates the next population. This process repeats itself till the stopping conditions are satisfied. The details of this approach are outlined in Algorithm 2.
Algorithm 2 The CHR-DE algorithm in memetic framework
Input:
Bounds of solution space hb, lb;
Population size NP;
Individual size ND;
Fitness function f(⋅); //Embedded with CHR penalty
Crossover rate cr;
Scaling factor F;
Output: Regression coefficient β*.
1: Generate initial population //Begin DE procedure
2: pop ← rand(NP, ND) × (hb − lb) + lb
3: for i = 1: NP do
4: Calculate f(pop(i))
5: end for
6: repeat
7: Select popr, pops popt randomly in pop
8: //Differential mutation
9: for i = 1: NP do
10: child(i) ← popr + F × (pops + popt)
11: //Crossover
12: jrand = ⌊rand × ND⌋
13: for j = 1: ND do
14: if rand < cr OR j == jrand then
15: offspring(i)(j) ← child(i)(j)
16: else
17: offspring(i)(j) ← pop(i)(j)
18: end if
19: end for
20: //Selection
21: if f(offspring) ≥ f(pop) then
22: pop ← offspring
23: end if
24: end for
25: //Adaptive local search
26: tmpPop ← mean(pop) + wL(pop − mean(pop))
27: for i = 1: NP do
28: for j = 1: NP − 1 do
29:
30: end for
31: C(1) ← 0
32: for j = 2: NP do
33: C(j) ← r(j − 1)(tmpPop(i − 1) − tmpPop(i) + C(j − 1))
34: end for
35: offspring ← tmpPop(NP) + C(NP)
36: if offspring ∈ (hb, lb) AND f(offspring) ≥ f(pop(i)) then
37: pop(i) ← offspring
38: end if
39: end for
40: untill stopping criterion is met
3.2.1 Chromosome representation: Intron and exon.
The first step of the CHR-DE approach is that the population of NP individuals initializing randomly with each chromosome which adopts the “intron + exon” encoding [13] to construct the penalized hyperparameters (intron) and the coefficients of each gene in the AFT model (exon), i.e., c = (a, b, γ, β1, β2, ⋯, βk). In CHR scheme, there are three parameters in intron part which should cover this range by uniformly randomizing individuals with minimum and maximum bounds lb, hb in the search space. DE searches for a global optimum in intron part which is ND dimensional real parameter space
:
(10)
where rand is a uniformly distributed random number lying between 0 and 1. Meanwhile, the CHR is performed on exon part for each introns in individuals, i.e., β to reach a local optimal solution and to gain the fitness of each individuals.
3.2.2 Fitness definition.
The mean squared error (MSE) and the concordance index (CI) are two criteria used to design a fitness function. In statistics, the MSE measures the average of the squares of the errors, which is evaluated by Eq (11) for survival data.
(11)
where the predicted value
.
In survival analysis, the CI is the standard performance measure for model assessment and quantifies the quality of rankings by Eq (12).
(12)
We employ the weighted-sum method [30] to change this bi-objective problem into a single objective problem. Thus, the individual with low MSE and high CI produces a high fitness value by Eq (13).
(13)
where wM is the weight of MSE for the individual i in the population, wC is the CI for this individual. These weight factors can be adjusted according to what people value as an important weight, e.g., if MSE is more important than CI, we set the weight factors wM = 95%, wC = 5%. Furthermore, the results with different values of wM and wC can be found in the S1 Appendix.
3.2.3 Differential mutation operation.
After initialization, DE uses a differential mutation operator based on linear combination.
(14)
The indices r, s, t are mutually exclusive integers randomly generated within the range [1, NP]. These indices are randomly generated once for each mutant vector child. The scaling factor F ∈ [0, 1+[ is a positive value which cannot be much greater than 1 for scaling the difference vector [31].
3.2.4 Crossover operation.
To enhance the potential diversity of the population, a crossover operation applied to each pair of the target vector pop and its corresponding mutant vector child to generate a trial vector offspring. We employ the binomial (uniform) crossover to create a single trial vector. This crossover is defined for each jth component of the ith parameter vector as follows:
(15)
where jrand ∈ [1, 2, ⋯, ND] is a randomly chosen index, which ensures that offspring gets at least one component from child.
3.2.5 Adaptive local search.
Usually in EAs the solutions with better fitness values are generally for reproduction, thus we use adaptive simplex crossover local search strategy for exploring the neighborhood of the best individual of population. Firstly, we expand the population with simplex crossover:
(16)
where wL is the control parameter of this local search. Then, generating the offspring upon the expansion population in Eqs (17) and (18).
(17)
(18)
3.2.6 Selection operation.
The solutions with better fitness values are generally preferred for reproduction, as they are more likely to be in the proximity of a basin of attraction. Therefore, we deterministically select the best individual of the population for exploring its neighborhood using the selection operation that is described as
(19)
where f(⋅) is the fitness function in Eq (13) to be maximized. Therefore, if the new trial vector yields an equal or higher value of the fitness function, it replaces the corresponding target vector in the next generation; otherwise the target is retained in the population. Hence, the population either gets better or remains the same in fitness status, but never deteriorates.
4 Results and discussion
4.1 Synthetic datasets
To demonstrate the performance of our proposed regularization procedure, we assume that the graph modules with 200 key factors (KFs) and that each regulates 10 different genes for a total of 2200 variables. Among these models and genes, 4 KFs and their 10 regulated genes (44 variables in total) are associated with the response based on the following model:
(20)
where the independent random noise ε ∼ N(0, 1), and the non-zero coefficients are specified as
For each KF, the X value is simulated from a N(0, 1) distribution, and conditional on the value of KF, we simulate the expression levels of the genes that they regulated from a conditional normal distributions ϱ of 0.2, 0.5, 0.7, and 0.9, respectively. For example, if the x1 is KF of xi, i = 2, 3, ⋯, 10, then we can define this group is xi = ϱ × x1 + (1 − ϱ) × xi. Therefore, we have a total of 2200 variables and 44 of them are relevant.
All of penalties in our experiments are solved by the general path seeking method [25]. The original DE for feature subset selection was conducted by Khushaba et al. [32]. For each model, we use two-thirds of simulated data for training and remaining one-third for testing with 600 samples. A 10-fold cross validation (CV) is conducted on training set for tuning parameters of all approaches. In our experimentation, the scaling factor F = 0.9, cross rate cr = 0.9, and the weight factors wM = 95%, wC = 5%, wL = 1 respectively. Because the population size should be small [29], we set NP = 4, and the stoping criterion of 10,000. In addition, we also calculate both sensitivity and specificity for each procedure, where
(21)
(22)
To further evaluate the performance of each penalties, we employ the prediction mean-squared errors (MSE) and the concordance index (CI) with standard errors.
After repeating the each penalties 50 times, the averaged results are summarized in Table 1. Generally, our proposed CHR-DE approach gives lower MSE with higher CI than other approaches. The CHR-DE also results in much higher sensitivity with comparable specificity for identifying the relevant features. The Lasso and ℓ1/2 without ℓ2-norm have strong selectivity especially in high grouping effect data ϱ = 0.7, 0.9. With the correlation ϱ increasing among genes, these no grouping effect penalties select a few genes, e.g., the sensitivity of ℓ1/2 is from 0.790 down to 0.091 (only selecting these 4 non-zero coefficient KFs) with highest specificity 0.998. The wrapper methods DE and CMA-ES have weaker selectivity than other grouping effect penalties, e.g., Elastic net, ℓ1/2 + ℓ2 and CHR, especially in the data containing low correlation features ϱ = 0.2. Although other grouping effect penalties have lower specificity, they perform well and select more correct genes whose coefficients β is non-zero, no matter what the conditional normal distributions ϱ. Comparing with the CHR’s hyperparameters tuning by grid search (CHR-GS), the CHR-DE utilizes the evolutionary algorithm to skip redundant parameter settings or to add new ones and ultimately achieves better performance.
Standard errors are given in parentheses.
4.2 Real datasets
We demonstrate the proposed methods by analyzing microarray expression data from NCBI’s gene expression omnibus (GEO) with the accession number, including breast cancer (GSE22210) [33], hepatocellular carcinoma (HCC, GSE10141) [34] and colorectal cancer (CRC, GSE103479). To evaluate our CHR-DE method, we divide these datasets at random two-thirds samples become training set and the remainders are test set. The details about these above datasets are shown in Table 2. Besides, the Figs 3–5 show the pathways of some selected genes by CHR-DE method in three different cancers rendered with cBioPortal [35]. The query genes are outlined with a thick border, and all other genes are automatically identified as altered in one cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer. The drugs that target genes are display with hexagons, and orange indicates FDA-approved.
The selected genes by CHR-DE are outlined with a thick border, and all other genes are automatically identified as altered in one cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer. The drugs that target genes are display with hexagons, and orange indicates FDA-approved.
The selected genes by CHR-DE are outlined with a thick border, and all other genes are automatically identified as altered in one cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer. The drugs that target genes are display with hexagons, and orange indicates FDA-approved.
The selected genes by CHR-DE are outlined with a thick border, and all other genes are automatically identified as altered in colorectal cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer.
4.2.1 Breast cancer.
GSE22210 contains 167 breast tumor samples with 1,452 genes obtained using GEO Platform GPL9183 [33]. Table 3 shows that the CHR-DE performs best in predicting the patients’ survival time with selecting smaller number of genes than the Elastic net and CHR-GS.
As see from the Table 4, CHR-DE penalty selects some unique genes, such as HIC1 LIF which play an important role in the development of primary breast cancer [36, 37]. The XIST is selected by these 8 different methods and lack an X chromosome decorated by XIST RNA causes the basal-like subtype of invasive breast carcinoma [38]. Moreover, some relevant genes are selected by other regularization models such as IL1B, NFKB1, IGF1R and SERPINB2 which are also found by the CHR-DE. Especially, the IL1B, NFKB1 and IGF1R in a small group of network by CHR-DE method as shown in Fig 3, and they are also targeted by several cancer drugs. The IL1B leads to enhanced production of proinflammatory cytokines triggered by the treatment, with subsequent effects on persistent fatigue in the aftermath of breast cancer [39]. Wood et al [40] identified NFKB1 mutation in breast tumorigenesis. As one of related receptors in insulin-like growth factor (IGF) system, type I IGF receptor (IGF1R) can influence the activity of estrogen receptor-α (ER) that can be used in promoting breast tumor regression [41]. The the plasminogen activator inhibitor type 2 (PAI2, SERPINB2), is significantly associated with increased survival in patients with breast cancer [42, 43].
4.2.2 Hepatocellular carcinoma.
GSE10141 contains 6,144 genes for 80 hepatocellular carcinoma (HCC) patients. Table 5 also shows that the CHR-DE performed best in predicting the patients’ survival time with selecting smaller number of genes than the Elastic net and CHR-GS.
As see from the Table 6, CHR-DE penalty selects some unique genes, such as KRT14, NOLC1. Liver cytokeratin14 (KRT14), a marker of liver stem cells, is only positive in G0 phase of hepatocellular carcinoma cell line Huh7 [44]. NOLC1 is regulated by CREB-NOLC1 pathway that plays an important role in hepatocellular carcinoma progression by modulating tumor growth, angiogenesis and apoptosis [45, 46]. Furthermore, the ADRB3, MAPK3, MGAT1, TGFBI and DAD1 are selected by CHR-DE penalty and other methods such as Lasso, ℓ1/2, DE, CMA-ES and CHR-GS meanwhile. Especially, the ADRB3 and MAPK3 in a small group of network by CHR-DE method as shown in Fig 4, and they are also targeted by several cancer drugs. Zhao et al [47] identified two pathways, “calcium signaling pathway” and “neuroactive ligand-receptor interaction” containing ADRB3, which correlated with middle and late stages of HCC development. Okabe et al [48] suggested that activation of the MAPK pathway containing MAPK3, MAPK9 is a common feature of HCC. Guo et al [49] reported alterations of glycogene and N-glycan such as MGAT1 in human hepatocarcinoma cells correlate with tumor invasion, tumorigenicity and sensitivity to chemotherapeutic drug. As a tumor suppressor, arginylglycylaspartic acid (RGD) peptides released from βig-H3, also known as transforming growth factor-beta-induced protein (TGFBI) peptides mediate apoptosis of Hep3B hepatoma cells [50]. While, βig-H3 can promote the progression of hepatocellular carcinoma as well [51, 52]. Tanaka et al [53] has demonstrated that high expression of DAD1 in HCC cells can activate oligosaccharyltransferase (OST) and block apoptosis, thereby enhancing tumor cell survival.
4.2.3 Colorectal cancer.
GSE103479 contains 110,961 genes for 155 colorectal cancer (CRC) patients. Table 7 also shows that the CHR-DE performed best in predicting the patients’ survival time with selecting smaller number of genes than the Elastic net and CHR-GS.
As see from the Table 8, the CDC42 is selected by CHR-DE penalty and other methods. It is one of the best characterized members of the Rho GTPase family, which was found to be up-regulated in several types of human tumors including CRC. Targeting CDC42 would potentially decrease CRC metastasis formation [54, 55, 56]. Furthermore, there are four selected genes CDC42, SLC10A2, TNRC6B and MOV10 in a small group of network by CHR-DE method as shown in Fig 5. This ileal sodium dependent bile acid transporter (ISBT; gene code: SLC10A2) has been associated with the risk for development of sporadic colorectal adenoma, a precursor lesion for CRC [57]. ATN1 may be promising biomarkers for the distinction between serrated and conventional CRC [58]. These two above genes SLC10A2 and ATN1 are selected by CHR-DE penalty and Lasso. The RPS11 is selected by these 6 different penalties at the same time. Kasai et al [59] demonstrated that RPS11 is highly expressed in CRC (especially in immature mucosal cells located in the crypt base) but can be detected hardly in the normal colorectal mucosa.
5 Conclusion
In this paper, we have proposed a penalized accelerated failure time model CHR-DE to recognize the biomarkers that are both biologically meaningful and clinically. This model is designed based on wrapper-embedded memetic framework that combines a non-convex regularization (local search) with differential evolution (global search). First, this new method inherits the robust power of regularization methods that integrate feature selection and learning procedure into a single process. Furthermore, our proposed method utilizes differential evolution (DE) to globally optimize the CHR’s hyperparameters, which make CHR-DE achieve strong capability of selecting groups of genes in high-dimensional biological data. We also developed an efficient path seeking algorithm to optimize this penalized model. The results in both synthetic and real datasets have indicated that the CHR-DE method is highly competitive against some existing feature selection approaches to select biomarkers in groups. Additionally, this CHR-DE scheme can be easily implemented in other high-dimensional and low-sample datasets.
Supporting information
S1 Appendix. The results with different values of MSE and CI weights.
We display the results with different weightings in synthetic datasets and breast cancer data (GSE22210).
https://doi.org/10.1371/journal.pone.0210786.s001
(PDF)
Acknowledgments
The authors thank Dr. Xiao-Ying Liu and Dr. Zi-Yi Yang for excellent technical assistance. This work is supported by the Macau Science and Technology Develop Funds (Grant No. 003/2016/AFJ) of Macao SAR of China and China NSFC project under contract 61661166011.
References
- 1. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological). 1996; p. 267–288.
- 2. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association. 2001;96(456):1348–1360.
- 3. Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2006;68(1):49–67.
- 4. Zhang CH. Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics. 2010;38(2):894–942.
- 5. Xu Z, Chang X, Xu F, Zhang H. L1/2 regularization: A thresholding representation theory and a fast solver. IEEE Transactions on neural networks and learning systems. 2012;23(7):1013–1027. pmid:24807129
- 6. Chu GJ, Liang Y, Wang JX. Novel Harmonic Regularization Approach for Variable Selection in Cox’s Proportional Hazards Model. Computational and mathematical methods in medicine. 2014;2014.
- 7. Zeng L, Xie J. Group variable selection via SCAD-L2. Statistics. 2014;48(1):49–66.
- 8. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2005;67(2):301–320.
- 9. Huang HH, Liu XY, Liang Y. Feature Selection and Cancer Classification via Sparse Logistic Regression with the Hybrid L1/2+2 Regularization. PloS one. 2016;11(5):e0149675. pmid:27136190
- 10. Liu XY, Wang S, Zhang H, Zhang H, Yang ZY, Liang Y. Novel regularization method for biomarker selection and cancer classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics (Accept). 2019.
- 11.
Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT Press; 2016.
- 12.
Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, Razavi A, et al. Population Based Training of Neural Networks. arXiv preprint arXiv:171109846. 2017;.
- 13. Liu XY, Liang Y, Wang S, Yang ZY, Ye HS. A Hybrid Genetic Algorithm With Wrapper-Embedded Approaches for Feature Selection. IEEE Access. 2018;6:22863–22874.
- 14.
Lanzi PL. Fast feature selection with genetic algorithms: a filter approach. In: Evolutionary Computation, 1997., IEEE International Conference on. IEEE; 1997. p. 537–540.
- 15.
Kennedy J. Particle swarm optimization. In: Encyclopedia of machine learning. Springer; 2011. p. 760–766.
- 16. Storn R, Price K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization. 1997;11(4):341–359.
- 17.
Vesterstrom J, Thomsen R. A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In: Evolutionary Computation, 2004. CEC2004. Congress on. vol. 2. IEEE; 2004. p. 1980–1987.
- 18. Nguyen QH, Ong YS, Meng HL. A probabilistic memetic framework. IEEE Transactions on evolutionary Computation. 2009;13(3):604–623.
- 19.
Hansen N, Ostermeier A. Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In: Evolutionary Computation, 1996., Proceedings of IEEE International Conference on. IEEE; 1996. p. 312–317.
- 20.
Bosman PA, Thierens D. Linkage neighbors, optimal mixing and forced improvements in genetic algorithms. In: Proceedings of the 14th annual conference on Genetic and evolutionary computation. ACM; 2012. p. 585–592.
- 21.
Bouter A, Alderliesten T, Witteveen C, Bosman PA. Exploiting linkage information in real-valued optimization with the real-valued gene-pool optimal mixing evolutionary algorithm. In: Proceedings of the Genetic and Evolutionary Computation Conference. ACM; 2017. p. 705–712.
- 22. Neri F, Cotta C. Memetic algorithms and memetic computing optimization: A literature review. Swarm and Evolutionary Computation. 2012;2:1–14.
- 23. Datta S, Le-Rademacher J, Datta S. Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO. Biometrics. 2007;63(1):259–271. pmid:17447952
- 24. Datta S. Estimating the mean life time using right censored data. Statistical Methodology. 2005;2(1):65–69.
- 25. Friedman JH. Fast sparse regression and classification. International Journal of Forecasting. 2012;28(3):722–738.
- 26. Merz P, Freisleben B. Memetic algorithms for the traveling salesman problem. Complex Systems. 2001;13(4):297–346.
- 27.
Dawkins R. The selfish gene. Oxford university press; 2016.
- 28. Zhu Z, Ong YS, Dash M. Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognition. 2007;40(11):3236–3248.
- 29. Noman N, Iba H. Accelerating differential evolution using an adaptive local search. IEEE Transactions on evolutionary Computation. 2008;12(1):107–125.
- 30.
Deb K. Multi-objective optimization. In: Search methodologies. Springer; 2014. p. 403–449.
- 31.
Price K, Storn RM, Lampinen J. Differential evolution: A practical approach. Springer-verlag. 2005;.
- 32. Khushaba RN, Al-Ani A, Al-Jumaily A. Feature subset selection using differential evolution and a statistical repair mechanism. Expert Systems with Applications. 2011;38(9):11515–11526. https://doi.org/10.1016/j.eswa.2011.03.028.
- 33. Holm K, Hegardt C, Staaf J, Vallon-Christersson J, Jönsson G, Olsson H, et al. Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns. Breast cancer research. 2010;12(3):R36. pmid:20565864
- 34. Villanueva A, Hoshida Y, Battiston C, Tovar V, Sia D, Alsinet C, et al. Combining clinical, pathology, and gene expression data to predict recurrence of hepatocellular carcinoma. Gastroenterology. 2011;140(5):1501–1512. pmid:21320499
- 35. Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. Science Signaling. 2013;6(269):pl1. pmid:23550210
- 36. Fujii H, Biel MA, Zhou W, Weitzman SA, Baylin SB, Gabrielson E. Methylation of the HIC-1 candidate tumor suppressor gene in human breast cancer. Oncogene. 1998;16(16). pmid:9572497
- 37. Shin JE, Park SH, Jang YK. Epigenetic up-regulation of leukemia inhibitory factor (LIF) gene during the progression to breast cancer. Molecules and cells. 2011;31(2):181–189. pmid:21191816
- 38. Richardson AL, Wang ZC, De Nicolo A, Lu X, Brown M, Miron A, et al. X chromosomal abnormalities in basal-like human breast cancer. Cancer cell. 2006;9(2):121–132. pmid:16473279
- 39. Collado-Hidalgo A, Bower JE, Ganz PA, Irwin MR, Cole SW. Cytokine gene polymorphisms and fatigue in breast cancer survivors: Early findings. Brain, behavior, and immunity. 2008;22(8):1197–1200. pmid:18617366
- 40. Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, et al. The genomic landscapes of human breast and colorectal cancers. Science. 2007;318(5853):1108–1113. pmid:17932254
- 41. Fagan DH, Yee D. Crosstalk between IGF1R and estrogen receptor signaling in breast cancer. Journal of mammary gland biology and neoplasia. 2008;13(4):423. pmid:19003523
- 42. Duffy MJ. The urokinase plasminogen activator system: role in malignancy. Current pharmaceutical design. 2004;10(1):39–49. pmid:14754404
- 43. Foekens JA, Peters HA, Look MP, Portengen H, Schmitt M, Kramer MD, et al. The urokinase system of plasminogen activation and prognosis in 2780 breast cancer patients. Cancer research. 2000;60(3):636–643. pmid:10676647
- 44. Kamohara Y, Haraguchi N, Mimori K, Tanaka F, Inoue H, Mori M, et al. The search for cancer stem cells in hepatocellular carcinoma. Surgery. 2008;144(2):119–124. pmid:18656616
- 45. Gao X, Wang Q, Li W, Yang B, Song H, Ju W, et al. Identification of nucleolar and coiled-body phosphoprotein 1 (NOLC1) minimal promoter regulated by NF-κB and CREB. BMB reports. 2011;44(1):70–75. pmid:21266110
- 46. Abramovitch R, Tavor E, Jacob-Hirsch J, Zeira E, Amariglio N, Pappo O, et al. A pivotal role of cyclic AMP-responsive element binding protein in tumor progression. Cancer research. 2004;64(4):1338–1346. pmid:14973073
- 47. Zhao Y, Xue F, Sun J, Guo S, Zhang H, Qiu B, et al. Genome-wide methylation profiling of the different stages of hepatitis B virus-related hepatocellular carcinoma development in plasma cell-free DNA reveals potential biomarkers for early detection and high-risk monitoring of hepatocellular carcinoma. Clinical epigenetics. 2014;6(1):30. pmid:25859288
- 48. Okabe H, Satoh S, Kato T, Kitahara O, Yanagawa R, Yamaoka Y, et al. Genome-wide analysis of gene expression in human hepatocellular carcinomas using cDNA microarray. Cancer research. 2001;61(5):2129–2137. pmid:11280777
- 49. Guo R, Cheng L, Zhao Y, Zhang J, Liu C, Zhou H, et al. Glycogenes mediate the invasive properties and chemosensitivity of human hepatocarcinoma cells. The international journal of biochemistry & cell biology. 2013;45(2):347–358.
- 50. Kim JE, Kim SJ, Jeong HW, Lee BH, Choi JY, Park RW, et al. RGD peptides released from βig-h3, a TGF-β-induced cell-adhesive molecule, mediate apoptosis. Oncogene. 2003;22(13):2045–2053. pmid:12673209
- 51. Tang J, Zhou Hw, Jiang Jl, Yang Xm, Li Y, Zhang HX, et al. βig-h3 is involved in the HAb18G/CD147-mediated metastasis process in human hepatoma cells. Experimental biology and medicine. 2007;232(3):344–352. pmid:17327467
- 52. Tang J, Wu YM, Zhao P, Jiang JL, Chen ZN. βig-h3 interacts with α3β1 integrin to promote adhesion and migration of human hepatoma cells. Experimental Biology and Medicine. 2009;234(1):35–39. pmid:18997105
- 53. Tanaka K, Kondoh N, Shuda M, Matsubara O, Imazeki N, Ryo A, et al. Enhanced expression of mRNAs of antisecretory factor-1, gp96, DAD1 and CDC34 in human hepatocellular carcinomas. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease. 2001;1536(1):1–12.
- 54. Arias-Romero LE, Chernoff J. Targeting Cdc42 in cancer. Expert opinion on therapeutic targets. 2013;17(11):1263–1273. pmid:23957315
- 55. Li Y, Zhu X, Xu W, Wang D, Yan J. miR-330 regulates the proliferation of colorectal cancer cells by targeting Cdc42. Biochemical and biophysical research communications. 2013;431(3):560–565. pmid:23337504
- 56. Ke TW, Hsu HL, Wu YH, Chen WTL, Cheng YW, Cheng CW. MicroRNA-224 suppresses colorectal cancer cell migration by targeting Cdc42. Disease markers. 2014;2014. pmid:24817781
- 57. Wang W, Xue S, Ingles SA, Chen Q, Diep AT, Frankl HD, et al. An association between genetic polymorphisms in the ileal sodium-dependent bile acid transporter gene and the risk of colorectal adenomas. Cancer Epidemiology and Prevention Biomarkers. 2001;10(9):931–936.
- 58. Chen H, Fang Y, Zhu H, Li S, Wang T, Gu P, et al. Protein-protein interaction analysis of distinct molecular pathways in two subtypes of colorectal carcinoma. Molecular medicine reports. 2014;10(6):2868–2874. pmid:25242495
- 59. Kasai H, Nadano D, Hidaka E, Higuchi K, Kawakubo M, Sato TA, et al. Differential expression of ribosomal proteins in human normal and neoplastic colorectum. Journal of Histochemistry & Cytochemistry. 2003;51(5):567–573.