ORN: Inferring patient-specific dysregulation status of pathway modules in cancer with OR-gate Network

Pathway level understanding of cancer plays a key role in precision oncology. However, the current amount of high-throughput data cannot support the elucidation of full pathway topology. In this study, instead of directly learning the pathway network, we adapted the probabilistic OR gate to model the modular structure of pathways and regulon. The resulting model, OR-gate Network (ORN), can simultaneously infer pathway modules of somatic alterations, patient-specific pathway dysregulation status, and downstream regulon. In a trained ORN, the differentially expressed genes (DEGs) in each tumour can be explained by somatic mutations perturbing a pathway module. Furthermore, the ORN handles one of the most important properties of pathway perturbation in tumours, the mutual exclusivity. We have applied the ORN to lower-grade glioma (LGG) samples and liver hepatocellular carcinoma (LIHC) samples in TCGA and breast cancer samples from METABRIC. Both datasets have shown abnormal pathway activities related to immune response and cell cycles. In LGG samples, ORN identified pathway modules closely related to glioma development and revealed two pathways closely related to patient survival. We had similar results with LIHC samples. Additional results from the METABRIC datasets showed that ORN could characterize critical mechanisms of cancer and connect them to less studied somatic mutations (e.g., BAP1, MIR604, MICAL3, and telomere activities), which may generate novel hypothesis for targeted therapy.


Introduction
• §2: "For example, PARADIGM [4] adapted known pathways as Markov networks and inferred pathway activities from the observed multi-omics profiles with the EM algorithm. " This description needs clarification. I am not sure about the term Markov networks. What is the "EM algorithm"?
More generally, the introduction lacks a more complete overview of related work and appropriate references.
• §4 (starting "In this work, we present OR-gate Network (ORN)..."): This presentation of the so-called OR-gate networks is too fuzzy. If I understand correctly, an ORN has 3 layers, one including the genes/proteins participating to the signalling pathway, the second being the gate, the third providing the status of the target gene (i.e. deregulated genes).
"Each layer in ORN is connected by logical OR gates such that the output of a node is true if any input to the gate is true." This is the obvious definition of an OR gate. I do not see why it would handle mutual exclusivity perturbation patterns, as those would rather require a XOR gate.

Material and Methods
• Data preprocessing section: the choice of the range [-2,2] for the CNV booleanisation would deserve a justification. Please also explain the meaning of SGA matrix (fig 2). Similarly, for the parameters used for the booleanisation of the transcriptomic data.
• Leaky OR-gate section: 1st formula is incomprehensible. Notation should be clarified: X 1 → y means that "X 1 has an effect on y". Later, there is a confusion between Y and y... define δ as δ i = P R(X i → y).
Furthermore, if I understand well, in Fig 3 the values indicated on the level X correspond to the P r(X i → y), whereas P r(X i = 1) will be given by the observation (it is thus not really a probability)...
Anyway, because it seems that this framework is closely relate to the "noisy-OR gates" of ref 19, I would suggest to cite this reference here, and use the description from this reference: (1) each of the causes X i has a probability P r(X i → y)of being sufficient to produce the effect in the absence of all other causes, and (2) the ability of each cause being sufficient is independent of the presence of other causes.
Then, the differences with noisy-OR gates should be better clarified, for which the P r(X i = 1) are the numbers labelling the edges in fig 3 B?
Note that the leaky parameter is not a novelty as this was already considered in ref 19.
• OR-gate network section: this section is particularly obscure to me.
How pathways (and thus their number P ) are inferred? The box "ORN inference of fig 2, needs explanations. How matrix U is determined? does it corresponds to the probability P r(mutation, diff expression)?
Then, the necessity for optimizing the ORN needs to be explained, I suppose that this is to get the probabilities P r(X i → y), but mostly to get the so-called pathways?
I was not able to check the remaining of the section as µ and ζ were not defined... Explanations are needed.
• Simulation and evaluation section: "Synthetic data was generated by following the noisy OR gate process." I t was said previously that the framework was not based on noisy OR gate... In any case, what is the "noisy OR gate process"?
"To simulate the mutual exclusivity patterns observed in real data, we performed post pruning. That is, when several mutations belonging to the same pathway took place in the same sample, all but one of them were removed." I have several concerns with this statement. First, it means that mutual exclusivity is supposed to systematically appear in the data, and this is not always the case, as co-occurrences alterations also frequently occur. Furthermore, I wonder how cross-talk are handled with this framework as it can be the case that e.g. for some pathway you would expect exclusivity of alteration for two genes, but for other two other independent pathways this exclusivity is not expected...
• Causal relation extraction section: here again (as indicated before) I wonder how pathways are defined? I suppose that they are derived from the previous procedure, but this is not clear at all. Furthermore, what is the rational underlying the choice of the number of pathways?