Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models

Survival prediction from a large number of covariates is a current focus of statistical and medical research. In this paper, we study a methodology known as the compound covariate prediction performed under univariate Cox proportional hazard models. We demonstrate via simulations and real data analysis that the compound covariate method generally competes well with ridge regression and Lasso methods, both already well-studied methods for predicting survival outcomes with a large number of covariates. Furthermore, we develop a refinement of the compound covariate method by incorporating likelihood information from multivariate Cox models. The new proposal is an adaptive method that borrows information contained in both the univariate and multivariate Cox regression estimators. We show that the new proposal has a theoretical justification from a statistical large sample theory and is naturally interpreted as a shrinkage-type estimator, a popular class of estimators in statistical literature. Two datasets, the primary biliary cirrhosis of the liver data and the non-small-cell lung cancer data, are used for illustration. The proposed method is implemented in R package “compound.Cox” available in CRAN at http://cran.r-project.org/.


Introduction
Predicting survival outcomes in the presence of a large number of covariates has received much attention in the recent decade.The prominent motivation for this comes from predictions of patient survival based on gene expression profiles.For example, gene expression profiles have been used to improve the prediction power of the clinical outcomes for breast cancer patients [1,2,3,4] and lung cancer patients [5,6,7].Utilizing gene profiles, van't Veer et al. [3] provided a criterion for selecting patients who would benefit from adjuvant therapy, which reduces patients' risks over traditional guidelines based on histological and clinical characteristics.Chen et al. [6] examined 672 gene profiles for non-small-cell lung cancer patients to identify a gene signature closely related to survival.Even without gene expression profiles, patients data often include a large number of clinical, serologic and histologic characteristics.Hence, it is of interest to efficiently utilize a large number of covariates to predict clinical outcomes.
A statistical challenge arises if the number of covariates p is large relative to the number of individuals n.The problem becomes further involved with the presence of censoring.The standard regression techniques in the presence of censoring, including the Cox regression analysis [8], fail to provide a satisfactory result.
Two types of strategies have been commonly used to perform survival prediction with a panel of covariate data.The first strategy is to select subsets of covariates by univariate survival analyses [1,6] or various clustering algorithms [9].Then, one can apply standard methods for prediction.The second strategy for resolving high-dimensionality utilizes some penalizing schemes on the Cox regression analysis.In particular, the Lasso [10,11,12] and ridge regression [13,14] are obtained by penalizing the Cox's partial likelihood function with L 1 and L 2 penalties, respectively.The two types of penalization yield p regression coefficients that are shrunk toward zero.
In this paper, we study a methodology known as the compound covariate prediction.The compound covariate prediction method is based on a linear combination of the univariate Cox regression estimates and has been previously used in medical studies with microarrays [5,6,15,16].However, few papers have investigated its statistical properties and comparative performance with other methods.For instance, recent comparative studies of Bovelstad et al. [17], van Wieringen et al. [18], and Bovelstad and Borgan [19] have all demonstrated that ridge regression has the overall best predictive performance among many well-known survival prediction methods, including univariate selection, forward selection, Lasso, principal components, supervised principal components, partial least squares, random forests, etc., but excluding the compound covariate method.Additionally, the compound covariate prediction can be a powerful method even for more traditional survival data that may not involve microarrays, as we will see in the analysis of the primary biliary cirrhosis of the liver data.Hence, the first objective of this paper is to study the statistical properties and comparative performance of the compound covariate method, in order to fill a gap in the current literature and highlight the competitive performance of the compound covariate method with other methods.
The second objective of this paper is to develop a new statistical methodology that refines the compound covariate method.This methodology aims to incorporate the combined predictive information of covariates into a compound covariate predictor by forming a mixture of multivariate and univariate Cox partial likelihoods.Such a method is shown to have a theoretical justification under a statistical large sample theory, and is naturally interpreted as a shrinkage-type estimator, a popular class of estimators in statistical literature.
We also compare the compound covariate and the newly proposed methods with the benchmark methods of ridge regression and Lasso analyses via Monte Carlo simulations and real data analysis.The primary biliary cirrhosis of the liver data and the non-small-cell lung cancer data are used for illustration.All the numerical performances of the methods are evaluated via cross-validated schemes.

Existing Methods
To facilitate the subsequent discussions, we shall introduce existing methods for predicting survival outcomes.Let x i ~( x i1 , :::, x ip )' be a p-dimensional vector of covariates from individual i.We observe ( t i , d i , x i ), where t i is either survival or censoring time, and d i satisfies d i ~1 if t i is survival time and d i ~0 otherwise.In the Cox regression [8], the hazard function for individual i is modeled as where b~( b 1 , :::, b p )' are unknown coefficients and h 0 is an unknown baseline hazard function.Let R i ~f ' : t ' §t i g be the risk set that contains individuals who still survive at time t i .The regression estimate is obtained by maximizing the partial likelihood given as When the dimension p is large relative to the sample size n, the maximum of L 1 n (b) is not uniquely determined.An intuitive and widely used approach to resolve highdimensionality is based on the univariate selection.As the initial step, a Cox regression based on the univariate model h(tDx ij )~h 0j (t) exp (b j x ij ), or a log-rank test between the high and low covariate groups, is performed for each j~1, :::, p, oneby-one.Then one picks out a subset of covariates that have low Pvalues from the univariate analysis (e.g., Jenssen et al. [1]).The top t covariates with lowest P-values are then included in a multivariate Cox regression, where the number t can be determined by cross-validation and/or biological consideration.Although the univariate selection is easy to implement, the process of selecting covariates is solely based on the marginal significance, and hence there is no guarantee that the resultant multivariate model achieves an accurate prediction.
A more sophisticated approach to resolve high-dimensionality is to utilize the L 1 penalized partial likelihood or the L 2 penalized partial likelihood where lw0 is the tuning (shrinkage) parameter.The two methods shrink the coefficients to zero.The estimator resulting from equation ( 3) is called the Lasso [10,11,12].An important feature of the Lasso is that many coefficients will be estimated exactly as zero.This implies that the Lasso can be used as a variable selection tool for a parsimonious prediction model.On the other hand, the estimation based on equation ( 4) is called ridge regression [13,14], which results in p non-zero coefficient estimates.Therefore, unlike the Lasso, the prediction model from ridge regression uses all the covariates.The tuning parameter l can be obtained empirically by a cross-validation criterion proposed by Verweij and van Houwelingen [20].Both the Lasso and ridge regression methods are implemented through the R package ''penalized'' [21].
There are a number of other methods available to handle highdimensional covariates, including the forward stepwise selection, principal components, supervised principal components, Lasso principal components, partial least squares regression, and treebased methods, etc.; refer to Witten and Tibshirani [22] for an excellent summary.Bovelstad et al. [17], van Wieringen et al. [18], and Bovelstad and Borgan [19] systematically compared these methods and concluded that ridge regression has the best overall performance for survival prediction.However, the compound covariate method has not been included in these comparative studies.

Compound Covariate Prediction
For a future subject with a covariate vector x~( x 1 , . . ., x p )', the survival prediction can be made by the prognostic index (PI) defined as w'x, where w~( w 1 , :::, w p )' is a vector of weights.Typically, w is determined by the dataset f (t i , d i , x i ); i~1, . . ., n g and is chosen so that w'x is associated with the subject's survival.When p is small relative to n, the multivariate Cox's partial likelihood estimator maximizing equation (2) can be used for w.Alternatively, one can set w j to be the estimated regression coefficient for b j by fitting the univariate Cox model h(tDx ij )~h 0j (t) exp (b j x ij ), for each j~1, :::, p, one-by-one.This prediction method is called the compound covariate prediction [23] and it is applicable even when p. n.The method has been shown to be useful in medical studies with microarrays as a convenient and powerful tool for survival prediction [5,6,15,16].Note that even when p, n, where a multivariate Cox regression is applicable, the compound covariate prediction may further improve predictive power.We will demonstrate this aspect through the analysis of the primary biliary cirrhosis of the liver data.

Refinement of the Compound Covariate Method
The construction of the compound covariate predictor is purely based on the univariate (marginal) likelihood functions.This methodology may be further improved by incorporating the combined predictive information of covariates into the compound covariate predictor.Here we propose a mixture of the multivariate and univariate (marginal) likelihoods.For each covariate j (~1, :::, p), the univariate Cox regression estimator for b j is obtained by maximizing We combine the likelihoods in equation ( 5) over all j (~1, :::, p), namely, Note that the maximizer of L 0 n (b) is found as the set of the p univariate Cox regression estimates even when pwn, and hence L 0 n (b) adapts easily to high-dimensionality.On the other hand, L 1 n (b) does not have a unique solution when pwn, although it potentially contains the combined predictive information of covariates.To gain an adequate compromise between L 1 n (b) and L 0 n (b), we consider a mixture log-likelihood where a[½0, 1 is the tuning (shrinkage) parameter.For a fixed a[½0, 1), the maximizer of equation ( 6 0)'x, with a larger a leading to a larger degree of multivariate likelihood information (Figure 1).It will be seen that the value of a can be empirically estimated by cross-validation.
The idea of the compound shrinkage as a mixture of the multivariate and univariate likelihoods is closely related to a ''shrinkage'' scheme in statistical literature.This has the effect of reducing (shrinking) the infinite dimensional solution space of the multivariate likelihood equations toward the unique nearest point of b b(0) as demonstrated in Figure 1.Here, a = 0 stands for the maximal shrinkage and a = 1 for no shrinkage.

Choosing the Shrinkage Parameter by Cross Validation
The shrinkage parameter a in equation ( 6) should be chosen so that the predictive power of b b(a)'x is maximized.For this purpose, we adopt a cross-validation criterion based on partial likelihood [20].To perform a K-fold cross validation, we first divide n individuals into K groups of about equal sample sizes, and label them as = k for k~1, :::,K.The maximizer of equation ( 6) based on all individuals not in = k is calculated and denoted by b b ({k) (a).Repeat this process for k~1, :::,K, and the cross-validation criterion is where ' 1 n, ({k) (b) is the log-partial likelihood based on all individuals not in = k .Finally, we find â a that maximizes equation (7).The numbers K~5 or K~10 are used commonly when n or p is large [16,17,24].Since the resultant estimators â a and b b(â a) are fairly robust against the choice of K in our simulations, we recommend K~5 for computational simplicity.

Evaluation Criteria
We first revisit the three measures for prediction accuracy proposed by Bovelstad et al. [17] ; i~1, . . ., n g.The P-value for testing the hypothesis H 0 : a~0 represents a measure of prediction ability.Smaller P-value corresponds to better prediction ability.
3) Deviance (Devi): Let ' Ã n (b) be the log-partial likelihood function calculated from the test dataset.The deviance {2f ' Ã n ( b b){' Ã n (0) g measures how the model with b~b b improves the null model with b~0 in terms of goodness-of-fit in the test dataset.Smaller deviance corresponds to better prediction ability.
We further consider the c-index proposed by Harrell et al. [25,26], which is a widely used measure for predictive accuracy for censored survival data: , Larger c-index corresponds to better prediction and cindex = 0.5 means no prediction ability.The c-index is a less subjective measure than the LR-test and Cox-test; it requires no choice of a cut-off point for categorizing PI as in the LR-test, and requires no model-fitting as in the Cox-test.The c-index is implemented in R (survConcordance routine in ''survival'' package) and other software [26].

Simulation Set-up
The objective is to compare the prediction ability of the compound covariate method, the compound shrinkage method, and other existing methods.Comparative studies of Bovelstad et al. [17], van Wieringen et al. [18] and Bovelstad and Borgan [19] all demonstrated that ridge regression has the overall best predictive performance among many well-known survival prediction methods, including the univariate selection, forward selection, Lasso, principal components, supervised principal components, partial least squares, random forests, etc.On the other hand, Gui and Li [11], Segal [12] and Bovelstad and Borgan [19] still report some cases in which the Lasso-type methods perform better.Hence, we focus on the two benchmark methods of ridge regression and Lasso as representatives of existing methods.
We set the p-dimensional regression parameter b'~( b 1 , :::, b q , b qz1 , :::, b p ) in the Cox model ( 1) with p = 100.Note that we also considered p = 50 and 200 but obtained similar results as reported in tables S1-1 , S1-4 in Supporting Information S1.Consider a case, in which some of covariates are related to survival time; the coefficients of the first q covariates are nonzero and those of the remaining p -q covariates are zero.We examined (I) sparse cases (q = 2, 4, 5 or 10) and (II) less sparse cases (q = 10, 15, 20 or 30).Note that both the sparse and nonsparse settings are plausible in biological problems [27].For the covariates x'~( x 1 , :::, x q , x qz1 , :::, x p ), we adopt the following random effects models to introduce correlations among the covariates with a correlation coefficient equal to 0.5: Scenario 1 (tag genes): Each of the q covariates is positively correlated to s covariates that have zero coefficients.Specifically, we set jƒ q ; j~qz(k{1)sz1, :::, qzks, k~1, :::, q ; j §qzqsz1 where A j ~U({0:75, 0:75), u j ~U({0:75, 0:75), U j ~U({1:5, 1:5), and they are independent of one another.This scenario represents the setting that q independent sets of genes are associated with survival; the (s +1) genes in each set are correlated, and after accounting for one ''tag gene'' in each set of genes, the other genes have no net effects on survival.
The former represents the setting that there exists a ''gene pathway'' of q correlated genes that jointly affect survival, and the latter does for two gene pathways of q/2 correlated genes.Hence, scenario 2 represents a setting where the genes informative for survival are correlated while scenario 1 represents a setting where the informative genes are independent of each other.
For both scenarios, the covariates are standardized so that they have standard deviation 1.The Cox model in (1) with h 0 (u)~1 is chosen to generate survival times.Censoring times are generated from U(0, 1), which yields moderate censoring (54,63%).We first generate a training dataset of n~100 individuals, and calculate b b, where b b is the compound covariate, compound shrinkage, ridge regression or Lasso estimator.K~5 crossvalidation is used to obtain the shrinkage parameters a for the compound shrinkage estimator and l for ridge regression and Lasso estimators.Ridge regression and Lasso analyses are implemented through the R package ''penalized'' [21].Then, we generate the test dataset of size n~100, independently of the training dataset, to calculate the prediction measures of LR-test, Cox-test, Devi, and c-index.
In the subsequent simulations, we follow Bovelstad et al. [17] to compare the values from the LR-test, Cox-test, Devi and c-index by their median among 50 replications of training/test datasets.

Simulation results
The results for the sparse cases (q = 2, 4, 5 or 10) are given in Table 1.The Lasso generally works best in all prediction measures.This pattern is only violated for the relatively large number of significant covariates (q = 10) where the compound covariate or compound shrinkage method achieves better performance in terms of the LR-test, Cox-test and c-index.Ridge regression usually performs worst in terms of the LR-test, Cox-test, and c-index.The compound shrinkage method is quite comparable in the LR-test, Cox-test, and c-index to the compound covariate method in all cases.
The four methods: CC = compound covariate, CS = compound shrinkage, Ridge = ridge regression, and Lasso = Lasso analyses are compared.The median values among the 50 replications for the LR-test (log 10 P-value), Cox-test (log 10 P-value), Devi, c-index, and tuning parameters â a or l l are reported.
The results for the less sparse cases (q = 10, 15, 20 or 30) are given in Table 2. Unlike the sparse cases, the Lasso usually performs worst in terms of the LR-test, Cox-test, and c-index, especially in scenario 1 where the Lasso estimates often result in the null model that has no prediction power (Devi = 0.000, cindex = 0.501, 0.538).Overall, the comparative performance of the compound covariate, compound shrinkage, and ridge regression methods are similar, but in scenario 2, the compound covariate and compound shrinkage methods perform better than the Lasso and ridge regression methods.
The four methods: CC = compound covariate, CS = compound shrinkage, Ridge = ridge regression, and Lasso = Lasso analyses are compared.The median values among the 50 replications for the LR-test (log 10 P-value), Cox-test (log 10 P-value), Devi, c-index, and tuning parameters â a or l l are reported.
In terms of the Devi, ridge regression and Lasso methods have much better performance than both the compound covariate and compound shrinkage methods.In fact, the Devi may be unfair to the proposed approach; the Devi measures a distance of b b from the benchmark value of b~0, and the majority of regression coefficients obtained by ridge and Lasso are very close to or exactly 0 by construction.In contrast, the compound covariate and compound shrinkage methods have poorer performance in the Devi because they are not shrunk to 0. However, poorer performance in the Devi is not carried over to other measures based on association between the prognostic index and the survival time, i.e., the LR-test, Cox-test, and c-index.
To see the robustness of the proposed method to the crossvalidation scheme, we perform the same set of simulations using K~10 cross-validation in place of K~5.The results (not shown) are virtually identical to these in Tables 1 and 2. Hence, the performance of the compound shrinkage method is less affected by the number of folds used in the cross-validation.
Although we found no single best method across all cases, the comparative performance of the compound covariate and compound shrinkage methods with other methods is remarkable.Unlike ridge and Lasso analyses that may exhibit poor performance in certain specific cases, the compound covariate and compound shrinkage methods provide more stable performance across different settings with sparse/non-sparse, independent/ correlated informative genes.This robustness property is desirable in practical applications.
We perform similar simulations by increasing the magnitude of non-zero coefficients.As reported in tables S1-5 and S1-6 in Supporting Information S1, prediction performance improved for all four methods, but the relative performances among them are similar to those seen in Tables 1 and 2.

The Primary Biliary Cirrhosis Data Analysis
The primary biliary cirrhosis (PBC) data used in Tibshirani [10] contains 276 patients with 17 covariates.Among them, 111 patients died while others were censored.The covariates consist of a treatment indicator, age, sex, 5 categorical variables (ascites, hepatomegaly, spider, edema, and stage of disease) and 9 continuous variables (bilirubin, cholesterol, albumin, urine copper, alkarine, SGOT, triglycerides, platelet count, and prothrombine).We use log-transformed continuous covariates to get stable results.We compare the prediction performance over 50 random 2:1 splits with 184 patients in the training set and 92 patients in the testing set.
Table 3 reports the results for comparing the compound covariate, compound shrinkage, multivariate Cox regression, ridge regression and Lasso analyses.Multivariate Cox regression analysis exhibits the worst performance, possibly due to a large number of covariates.The other four methods that adapt to highdimensionality exhibit higher prediction power.Of these methods, the compound covariate method performs best in terms of the LRtest, Cox-test and c-index.This implies that the compound covariate has the highest ability to discriminate between the poor and good prognostic patients in the testing set.Notice that the poor Devi value of the compound covariate method does not affect its prediction power for patients' prognosis.NOTE: The median among the 50 replications for the LR-test (log 10 P-value), Cox-test (log 10 P-value), Deviance, c-index, and tuning parameters â a or l l are reported.Smaller values of the LRtest, Cox-test and Deviance, and larger values of the c-index correspond to more accurate prediction performance.

The Lung Cancer Data Analysis
The non-small-cell lung cancer data of Chen et al. [6] is available from http://www.ncbi.nlm.nih.gov/projects/geo/, with accession number GSE4882.The data contains 672 gene profiles for 125 lung cancer patients.Among them, 38 patients died while others were censored.We use a subset consisting of 485 genes whose coefficient of variation in expression values is greater than 3%.We divide the patients into 63:62 training/test datasets as in Chen et al. [6].Univariate Cox regression analysis based on the training set identifies 16 genes that are significantly related to survival (P-value ,0.05).Chen et al. [6] used the 16 regression coefficients to classify the patients of the test dataset into good or poor status.This 16-gene method is a compound covariate analysis applied to the selected set of genes, though the compound covariate method is applicable for the full sets of 485 genes.To illustrate the compound covariate and the compound shrinkage methods with high-dimensional covariates, we select p = 97 genes whose P-values of the univariate analysis are less than 0.20 in the training dataset of n = 63, and set the coefficients of remaining genes to zero.We compare the compound covariate, compound shrinkage, ridge regression, and Lasso methods as well as the 16-gene compound covariate method of Chen et al. [6].The results are summarized in Table 4.In terms of the LR-test, the compound covariate method performs best, while, in terms of the Cox-test and c-index, the compound shrinkage method performs best.Figure 2 shows that the two survival curves for the good and poor prognosis groups are best separated by the compound covariate method.However, Figure 3 shows that the Kaplan-Meier curves for the good, medium and poor prognosis groups cross one another and are less distinguishable by the compound covariate method.Here the good, medium, and poor groups are determined by the tertiles of the PI's in the test datasets.On the other hand, the three Kaplan-Meier curves are well-distinguished in the compound shrinkage method, as implied by its best performance in the Cox-test and c-index (Figure 3; Table 4).This analysis suggests that, compared to the compound covariate method, the compound shrinkage method may provide more accurate ranking of patients' risks with respect to their survival status.Although ridge regression and Lasso has much smaller deviance, it has poorer performance in the LR-test, Cox-test and c-index.
To see the robustness of the conclusion, comparison of the methods is made under various different numbers of genes, including p~124 genes whose P-values of the univariate analysis are less than 0.25.As seen from the Supporting Information S2, the compound covariate method still performs best in terms of the LR-test.However, the compound shrinkage method still has the best performance in the Cox-test and c-index, and it provides the best separation among the survival curves for the good, medium, and poor prognosis groups.In fact, the compound shrinkage method almost always has the best c-index values under varying number of genes passing a univariate pre-filter for inclusion in the PI (Figure 4).Hence, the conclusion is unchanged.
We also compared the computation time of the four methods in Table 4.The compound covariate method achieves the fastest computation time since it merely repeats p = 97 univariate Cox regressions using the R ''coxph'' routine.Ridge regression requires about 5 times and Lasso has about 7 times longer computation time than the compound covariate method.The compound shrinkage is decidedly the slowest, due to the cost of finding highdimensional maxima b b(â a) and b b ({k) (a).

Large Sample Results for the Shrinkage Method
The first analytical result of the compound shrinkage method is the large sample consistency of the survival prediction.That is, as n?? with fixed p, the estimated shrinkage parameter â a tends to 1 and the compound shrinkage estimator b b(â a) tends to the true parameter value b 0 .The second and more practically important result is a formula for the standard deviation of b b(â a) that may be useful for calculating P-values for each covariate.
To describe the analytical properties of â a and b b(â a), define, for k~0, 1, 2, where x 0 i :1, x 1 i :x i , x 2 i :x i x' i and Y i (t)~I(t i §t) with I( : ) being an indicator function, and for j~1, :::, p, (1) (b; t)=S (0) (b; t), e(b; t)~s (1) (b; t)=s (0) (b; t), E j (b j ; t)~S (1)  j (b j ; t)=S (0) j (b j ; t), e j (b j ; t)~s (1)  j (b j ; t)=s (0) j (b j ; t), V(b; t)~S (2) (b; t)=S (0) (b; t){E(b; t)E(b; t)', V j (b j ; t)~S (2)  j (b j ; t)=S (0) j (b j ; t){E j (b j ; t) 2 : The score function defined as the derivative of ' a n (b) with respect to b is given by The observed Fisher information matrix, the negative of the Hessian of ' a n (b), is where diag( V 1 (b 1 ; t i ), :::, V p (b p ; t i ) ) is the diagonal matrix with the diagonal element ( V 1 (b 1 ; t i ), :::, V p (b p ; t i ) ).It is easy to verify that V a n (b) is positive semi-definite and hence ' a n (b) is where I p is the unit matrix of size p.The estimator S â a n ( b b(â a) ) gives reasonable approximation to the variance of b b(â a) even when p is large (see simulations for p = 100 and n = 100 in Supporting Information S3).The variance estimate facilitates the Wald-type test for significance of the regression coefficients.

Analytical Comparison with the Lasso and Ridge Regression
Unlike the Lasso and ridge regression in equations ( 3) and ( 4), which shrink the regression coefficients toward 0~( 0, :::, 0 )', the compound shrinkage estimator is obtained by shrinking the coefficients toward the compound covariate estimator b b(0)~( b b 1 (0), :::, b b p (0) )'.We apply a statistical large sample theory on the misspecified Cox regression analysis [29,30] to demonstrate that shrinking the regression coefficients toward the compound covariate estimator may be more informative than shrinking toward 0 when covariates are independent.When n goes to infinity, the compound  With complete shrinkage, the difference between the two estimators becomes evident since b b Shrink (0)~f diag(X'X) g {1 X'y while b b Ridge (?)~0.

Computing Algorithms
Numerical maximization of ' a n (b) in equation ( 6) can be done through quasi-Newton type algorithms.For instance, the R ''nlm'' is a reliable routine to find the minimum of {' a n (b) with a large p. Numerical maximization of CV (a) in equation ( 7) can be obtained by a grid search on finely selected values of a as commonly done in cross-validation [17,24].In our numerical studies we observe that the graph of CV (a) is always unimodal, and calculating CV (a) with smaller a is always faster than with larger a. Utilizing these properties, we suggest the following computation algorithm, which is more efficient in computation than the ''exhaustive search'' procedure: Step 1: Set a~0 and a positive number Da (e.g., Da~0:025), and calculate CV (0).
Step 3: Stop the algorithm and set â a~a.

Conclusions
We have revisited a compound covariate prediction method for predicting survival outcomes with a large number of covariates.This method is popularly employed in medical studies, but its statistical performance has been less studied in the literature.We investigate the prediction power of the method by comparison with the well-known methods of ridge regression and Lasso, both of which adapt to a large number of covariates.The simulations demonstrate that the compound covariate method has better predictive power than ridge regression when only a few among a large number of covariates associate with the survival (i.e., sparse cases), and that it performs better than the Lasso when many of a large number of covariates simultaneously affect the survival (i.e., less sparse cases).The compound covariate method exhibits best predictive power among all the competitors in the primary biliary cirrhosis dataset, including the multivariate Cox regression, ridge regression and Lasso.In the even much higher dimensional lung cancer microarray data, where the multivariate Cox regression no longer applies, the compound covariate method similarly outperforms ridge regression and Lasso.Hence, the compound covariate method is a computationally attractive and powerful technique for survival prediction with a moderate or large number of covariates.
To further improve the prediction power of the compound covariate prediction, we propose a novel shrinkage type estimator for survival prediction with a large number of covariates.The new shrinkage scheme refines the compound covariate method by incorporating the multivariate likelihood information into the compound covariate predictor.Our simulation studies demonstrate that, in the sparse signal setting, the Lasso strongly outperforms the ''non-sparse'' methods, including ridge regression, compound covariate and compound shrinkage methods.On the other hand, in settings with less sparse signals, the compound covariate and compound shrinkage methods perform comparably to ridge regression, and all these methods outperform the Lasso method.Given that the non-sparse setting is not uncommon [27], and ridge regression shows best overall performance in several comparative prediction studies [17,18,19], the compound covariate and compound shrinkage methods have the potential to be useful alternatives.Our proposal also provides a novel framework of shrinkage estimation that encompasses the simple but effective compound covariate method as a special case.In the lung cancer data analysis we find that, the major advantage of the proposed compound shrinkage method over the compound covariate method is in its more accurate prediction of patient's survival status.We also establish statistical large sample theories, including consistency and standard error estimation of the parameter estimator, for the proposed shrinkage method.Given these numerical and theoretical evidences, the proposed prediction scheme seems to be a method that can be reliably applied for survival prediction.The method is implemented by an R package ''compound.Cox'' available in CRAN at http://cran.r-project.org/.
A potential extension of the proposed shrinkage method is the development of covariate selection.This is clearly an important issue in microarrays in which the focus is to select genes that achieve good predictive power.If the gene selection is the main focus, we find the Lasso method offers an elegant solution since it gives an automatic way of selecting genes.In fact, the Lasso shows excellent performance when the signal is sparse, as shown in our simulation studies (Table 1).However, in the presence of a large number of informative genes (less sparse cases), the performance of the Lasso is less reliable since it tends to select only a few genes among them and often results in the null model with no prediction power (Table 2).A large number of informative genes are also encountered in the lymphoma data reported in Matsui [16], where the number of genes in the optimal set is t = 75 or 85.Matusi [16] suggests a gene filtering procedure that chooses the top t genes in terms of univariate Cox analyses, where t is the threshold that leads to the best predictive power in cross validation.Although this methodology is computationally simple, the top t genes are based on univariate significance only.Hence, it is interesting to extend the gene filtering approach to take into account the combined, multivariate predictive information of genes using the proposed shrinkage method.We will leave this problem to a future research topic.
) is denoted by b b(a).We will call b b(a) the compound shrinkage estimator, and b b(0) the compound covariate estimator, which is a special case of b b(a) at a~0.The compound shrinkage predictor b b(a)'x can thus be viewed as a generalization of the compound covariate predictor b b(

2 )
, n g be a training dataset and b b an estimator obtained from the training dataset, and let f (t Ã i , d Ã i , x Ã i ); i~1, . . ., n g be a test dataset.1) Log-rank test (LR-test): Subject i in the test dataset is categorized in the good (poor) prognosis group if b b'x Ã i is below (above) the median of f b b'x Ã i ; i~1, . . ., n g.The P-value for a log-rank test performed in the test dataset for comparing survival times in the two groups represents prediction performance.Smaller P-value corresponds to better prediction ability.Cox regression test (Cox-test): By treating g Ã i ~b b'x Ã i as a covariate, the Cox model h(tDg

Figure 3 .
Figure 3. Kaplan-Meier curves for the 62 patients in the lung cancer data of Chen et al. [6].Good (blue), medium (black), and poor (red) groups are determined by the tertile of the PI's in the test dataset.doi:10.1371/journal.pone.0047627.g003

Table 1 .
Simulation results under sparse cases with p = 100 and n = 100 based on 50 replications.

Table 2 .
Simulation results under less sparse cases with p = 100 and n = 100 based on 50 replications.

Table 3 .
Performance of the five methods based on the primary biliary cirrhosis of the liver data. doi:10.1371/journal.pone.0047627.t003

Table 4 .
[6]formance of the five methods based on the nonsmall-cell lung cancer data of Chen et al.[6].Smaller values of the LR-test (log 10 P-value), Cox-test (log 10 P-value) and Deviance, and larger values of the c-index correspond to more accurate prediction performance.*Ifgoodandpoorgroups are separated by the median PI in the training set, the LR-test has P-value = 0.034 (log 10 P-value = 21.47) with n = 28 in the good and n = 34 in the poor groups (the same result as Figure1Cof Chen et al.[6]).Now we state the large sample results as n?? with fixed p; the proofs are given in Supporting Information S3.Assume that f (t i , d i , x i ); i~1, . . ., n g are independently and identically distributed under the model (1) with b~b 0 , and a[½0, 1 is a fixed constant.Applying martingale calculus and the concave property