Fig 1.
Overall scheme of StellarPath.
(A) Workflow of StellarPath. The method classifies two patients’ classes and works with both dense and sparse (e.g., somatic mutation) omics. It finds the significantly deregulated molecules and their enriched pathways. It determines how much each pair of patients is similar by comparing the values of the molecules belonging to a specific pathway. It uses pathway-specific similarities to build a network. Only the PSNs, which show that one class is cohesive while the opposite one is not, are kept. StellarPath receives an unknown patient, adds it to a significant PSN, and predicts its class with a graph convolutional network. StellarPath provides multiple output data: (B) significant pathway-specific PSNs provide how much the classes in comparison are separated (x-axis), how strong the cohesion (y-axis) within the cohesive class (color) (C) a PSN represents the pathway’s similarities due to differentially expressed and differentially stable molecules (D) both known and unknown patients are stratified and quality checked based on how much are similar in the pathways.
Fig 2.
Violin plot of the classification performances (Table C in S1 Tables) measured with the Matthews Correlation Coefficient (MCC). For netDx, one value of MCC is determined with the predictions made by the classifier applied to one non-pathway consensus PSN for each run of cross-validation. For StellarPath, there are two types of violins. One value of MCC composing a StellarPath GCN violin is determined with the predictions made by a GCN using one pathway-specific PSN. One value of MCC composing a StellarPath Ensemble violin is determined with the predictions made by majority voting from all the trained GCNs. The dot of a violin represents its average.
Table 1.
StellarPath and netDx predictive pathways.
Type and number of significant and predictive pathways resulting from StellarPath analysis and netDx analysis.
Fig 3.
(A) Scatter plot comparing the separability of the patient’s classes in the PSNs selected by StellarPath and netDx for classifying the datasets. The separability of the classes is assessed with StellarPath’s Power ranking system. Each dot’s size indicates the number of PSNs with the corresponding Power value on the Y-axis. The dot’s color signifies which method has a greater number of PSNs with a specific Power. (B) Bar plot comparing the Jaccard index about the quality of the unsupervised patient clusters identified in the PSNs compared to the real patient’s classes of each dataset. The Y-axis refers to the Jaccard index. A taller bar indicates a closer overlap between the unsupervised patient clusters and the actual patient classes.
Fig 4.
Molecules and pathways relevance.
(A) Scatter plot of the pathways represented by predictive OGD WT signature PSNs. The X-axis indicates how much the OGD WT samples are cohesive in the PSN associated with a pathway. The Y-axis indicates how much the N WT samples are cohesive. The size of the dot indicates how many molecules are significantly deregulating a pathway. The color indicates if a pathway is associated with the OGD phenotype based on IPA. The best pathways are in the bottom right corner because of strong similarities between OGD WT samples and weak similarities between the N WT ones. (B) Scatter plot of the deregulated molecules belonging to the enriched pathways represented by predictive OGD WT signature PSNs. The X-axis indicates how much a molecule is stable in the OGD WT samples against the controls. The Y-axis indicates how much a molecule is deregulated between OGD WT versus N WT. The best molecules are in the top right part of the plot because they are the most deregulated and stable in OGD WT.
Table 2.
Number of significant pathways found by StellarPath and GSEA.
Fig 5.
(A) Scatter plot of the pathways represented by predictive UM-CLL signature PSNs that StellarPath found during training. The X-axis indicates the rank of separability between the two classes. The Y-axis indicates the name of the pathways. The size of the dot depends on the smallest similarity (low percentile of the Separability Power system) between UM-CLL patients which is higher than the highest similarity (high percentile) between M-CLL patients. The color indicates how the UM-CLL subtype is deregulating the pathway. (B) Violin plot of the classification performances assessed comparing the predicted classes of the unknown patients to their true subtype. (C) UM-CLL signature PSN of the activated B Cell Receptor Signaling Pathway of Power 2, where UM-CLL patients are represented by green nodes, M-CLL patients by blue nodes, green lines represent edges between UM-CLL nodes, grey lines represent inter-similarities and blue lines are between M-CLL nodes. A line thickness represents the similarity value associated with it. The size of the node is determined by the patient’s centrality. The PSN is sparse because the edges with low similarity have been hidden. Thanks to StellarPath’s plot function, the PSN represents 123 patients in 20 nodes.
Fig 6.
The bar plots show the computational resources required by StellarPath and netDx to classify the datasets. Running time is measured in hours and the memory RAM is measured in the maximum amount of Gigabytes that the software used. The X-axis indicates the datasets. The Y-axis indicates the measurement (Table O in S1 Tables).