Skip to main content
Advertisement

< Back to Article

Recurrent somatic mutations reveal new insights into consequences of mutagenic processes in cancer

Fig 3

Workflow of the recurrence-based approach to group cancer genomes.

The 42 features are described in detail in S1 File (Step 1). We scale all features to zero mean and unit variance to compensate for the differences between the ranges of the features (Step 2). The arrows in the PCA plot indicate the direction and level of contribution of the features that contribute above average to the first two PCs (Step 3). Seven of these features are related to recurrence. An interactive 3D version of the PCA plot is available here: https://plot.ly/~biomedicalGenomicsCNAG/1.embed. We take a subset of the PCs and consider the remaining PCs to capture noise (Step 4). For the hierarchical clustering we use the Euclidean distance as a dissimilarity measure and Ward’s method as the linkage criterion (Step 5). The results of the hierarchical clustering are used as a starting point for k-means clustering (Step 6). Some samples will in this step switch to a different cluster compared to the initial partition. This consolidation step is repeated a maximum of 10 times. Further details on the annotation of the clusters (Step 7) are described in S3 Text.

Fig 3

doi: https://doi.org/10.1371/journal.pcbi.1007496.g003