ࡱ > 6 8 5 q` bjbjqPqP .( : :
8 X l : ^
$ h X j
Qؗ k v
0 :
R
4
3
R : D $
Detailed methods for classification approaches
To predict BRAF status, we used a binary classification tree classifier also known as CART. ADDIN EN.CITE Breiman19843993996Breiman, L.Friedman, J.H.Olshen, R.Stone, C.J.Classification and Regresssion Trees1984MontereyWadsworth(22). The fit was performed using the rpart package of the statistical software R, and the tree size was determined according to L. Breiman recommended 1-SE rule ADDIN EN.CITE Breiman19843993996Breiman, L.Friedman, J.H.Olshen, R.Stone, C.J.Classification and Regresssion Trees1984MontereyWadsworth(22), which chooses the smallest number of terminal nodes with the cross-validation error within one standard deviation of the minimum. We note that other rules for the determination of node size exist but the choice of the rule is determined by the demands of the problem on hand. Here, we strove for the most parsimonious solution without overfitting, opting for rules that were likely to lead to smaller sized trees. The trees were fit with morphological variables alone and also using all variables. The surrogates of the primary tree split are defined as variables that produce the most similar children-node allocations as the primary split, whereas the competitors are defined as variables that are second (third etc) best at minimizing impurity of the children nodes. For each split of a given decision tree we evaluated the surrogates and competitors to increase our understanding of alternative tree topologies.
Multiple trees were fit using the Random Forest algorithm ADDIN EN.CITE Breiman20013040304017Breiman, LeoRandom ForestsMachine Learning5-324512001(23). This method is based on repeatedly sampling data points and building the tree with each resulting pseudo-set while additionally subsampling variables at each split. They are known to provide superior accuracy with no overfitting due to resampling of both samples and variables. The resulting collection of individual trees is referred to as random forest. The final prediction for each sample is the outcome of the majority vote of the trees in the forest. The unbiased estimate of the test set error is estimated internally by only using trees for prediction for which a given sample was left-out, omitting the need for cross-validation or a separate test set. This is called out-of-bag error rate estimate ADDIN EN.CITE Breiman19963038303817Leo BreimanBagging predictors.Machine Learning123-14024 (2)1996(26). We used the cforest function of party package of R, an implementation of the algorithm which precludes the inclusion of samples with missing variables. We therefore excluded samples with missing values in the comparison of the performance of single tree and Random Forests algorithms.
We also explored more classifiers. Multivariate logistic regression was fit with the variables selected by the corresponding single tree. We also considered the Bayesian information criterion (BIC) ADDIN EN.CITE Schwarz19782373237317Schwarz, G.Estimating the Dimension of a Model.Annals of Statistics461-4,61978(38) as an alternative model selection method by fitting logistic regression with all the variable simultaneously and performing backwards deletion. All statistical analyses were conducted using the freely available R statistical language.
1
F G Z
[
_
`
& t u 0 5 ? H R S T 2 3 , - | } hh/; hd_ hH7 OJ QJ ^J hh/; hd_ hZi OJ QJ ^J hh/; hd_ CJ OJ QJ ^J aJ hh/; hd_ 6OJ QJ ^J !j hh/; hd_ OJ QJ U^J hh/; hd_ 5>*OJ QJ ^J hh/; hd_ OJ QJ ^J : 1 2 dh gdd_ gdd_ 8 V W X \ d g h } ! " # h> j h> Uhh/; hd_ 5OJ QJ ^J hh/; hd_ hZi OJ QJ ^J !hh/; hd_ hZi 6OJ QJ ^J !j hh/; hd_ OJ QJ U^J hh/; hd_ OJ QJ ^J 2 dh gdd_ , 1h/ =!"#$% @ @ @ d_ N o r m a l CJ _HaJ mH sH tH D A D D e f a u l t P a r a g r a p h F o n t R i@ R T a b l e N o r m a l 4
l 4 a ( k ( N o L i s t ( 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I 0 0 I 0 0 I 0 0 I 0 0 I 0 0 P F Z _ 2
QQQQQ 8 @ 0 (
B
S ? ) 0 3 3 3 r h/; F Zn > d_
T @ r ` @ U n k n o w n G : Ax T i m e s N e w R o m a n 5 S y m b o l 3& : Cx A r i a l " 1 h 33 }
% }
% ! 4 2q KX ? d_ 2 S U P P L E M E N T A R Y M A T E R I A L A m a y a A m a y a Oh +'0
< H
T ` h p x SUPPLEMENTARY MATERIAL Amaya Normal.dot Amaya 3 Microsoft Office Word @ F# @ &ؗ@ Jؗ } ՜. +,0 h p |
%
SUPPLEMENTARY MATERIAL Title
! " # $ &