PATHLOGIC-S: A Scalable Boolean Framework for Modelling Cellular Signalling

Curated databases of signal transduction have grown to describe several thousand reactions, and efficient use of these data requires the development of modelling tools to elucidate and explore system properties. We present PATHLOGIC-S, a Boolean specification for a signalling model, with its associated GPL-licensed implementation using integer programming techniques. The PATHLOGIC-S specification has been designed to function on current desktop workstations, and is capable of providing analyses on some of the largest currently available datasets through use of Boolean modelling techniques to generate predictions of stable and semi-stable network states from data in community file formats. PATHLOGIC-S also addresses major problems associated with the presence and modelling of inhibition in Boolean systems, and reduces logical incoherence due to common inhibitory mechanisms in signalling systems. We apply this approach to signal transduction networks including Reactome and two pathways from the Panther Pathways database, and present the results of computations on each along with a discussion of execution time. A software implementation of the framework and model is freely available under a GPL license.


Chemical Reaction Formulation
We have a system composed of two signalling events, with signals S A , S B , a catalyst C, inhibitor I and outputs O 1 , O 2 : (2)

Step 1: Conversion to Logical Form
This is converted into a logical format (with variables labelled with the names of their equivalent signals for convenience) of : The introduced intermediate variables (R 1 in this example) are important in the modelling of inhibition as shown in Step 3, and in the modelling of multiple different activating events.

Step 2: Conversion to Production Rules
The logical format must then be rewritten to produce a set of production rules for each species. Note that signals may appear on the right hand side of at most one logical statement in this form:

Step 3: Integration of Inhibition Information
Information about inhibition must then be incorporated into the production rules using the NOT (¬) operator. This changes equation 5 to: 1.5 Step 4: Conversion to OR-form The system of production rules is then converted into OR-form as per Haus (2009). The following system of statements results: 1.6 Step 5: Curation Data presented to the system may be curated, a process that can be described as follows: First, the system is converted to a graph formulation as discussed in the Materials and Methods. Each logical variable is represented as a node, and edges correspond to implies operators. For example, the logical statements described in Step 3 are presented as: There is a strongly connected component (SCC) in this network consisting of C → R 1 → C, representing the catalytic action of C. The presence of the logic underlying the SCC can lead to erroneous predictions, and the system of logical statements must be modified appropriately. In this case, we can either delete the relationship C → R 1 , or R 1 → C.
We must retain C → R 1 , as our formulation requires the presence of the catalyst in order for the signalling event to take place. The relationship R 1 → C (the dashed edge) represents redundant information -R 1 cannot be an active event unless C is already active -and is safe to remove. Not all curation issues are as clear as this example, and care must be taken when modifying the logical statements so as not to introduce funtion-altering modifications to the network.
The resulting system of production statements is:

Step 6: Conversion to Integer Constraints
Finally, the OR-form production rules are converted for solution in an integer program, again as per Haus (2009). For a system of production rules S of the form L → R where For this example, the resulting constraints are: From statement 17: From statement 18: From statement 19: System inputs for this network are S A , C, I, and S B . System outputs are O 1 and O 2 .

Minimum Input
The goal of the minimum input problem is to find an assignment of variables to a set of system inputs (ACT IV E) that give rise to a user-specified model state ST AT ES so that the size of ACT IV E is minimized. In the event that the user specifies a ST AT ES assignment that is not biologically feasible in the context of the signalling model M , then an error condition should be indicated.
Algorithm 1 Compute the set of inputs with minimal cardinality that results in a given network state.

Minimum Input Set
The minimum input sets problem is that of finding the set of distinct minimum inputs. Minimum inputs A and B are distinct if A\B = ∅ and B\A = ∅. The general approach to enumerating members of the minimum input set is outlined below: Algorithm 2 Compute the minimum input set resulting in a given network state. Input: LP , an integer programming solver with constraints, objective and bounds instantiated. CU T S, a set of variables in LP representing a set of integer cuts. IN P U T S, a set of variables describing system inputs. Output: A minimum input set containing all distinct minimum inputs. for Store the minimum input generated by LP for each variable v i contained in IN P U T S do if v i is active in the solution for LP then CU T S ← a copy of CU T S Add v i to CU T S if CU T S hasn't been processed before then Recursively compute the minimum input set using CU T S end if end if end for end if return the stored minimum input set Initially, this algorithm is invoked with LP established as in the minimum input problem and CU T S = ∅, to compute the minimum input giving rise to a user-specified model state.
Successive integer cuts are then applied in order to enumerate the set of all distinct minimum inputs. If a solution to LP has n inputs active, the number of possible immediate descendents is 2 n − 1 (ie, cuts based on the powerset of active inputs, excluding the empty set). However, infeasibility due to cutting some variable v i from the solution means that any problem where v i ∈ CU T S must also be infeasible, thus allowing us to create n descendents instead of 2 n − 1 without loss of generality or exclusion of valid solutions.

Reactome
The Reactome data used in this study is dated 22nd September 2011 and comprises data for Homo sapiens in BioPAX Level 3 format sourced from Reactome. Respiratory electron transport, ATP synthesis by chemiosmotic coupling, and heat production by uncoupling proteins ABC-family proteins mediated transport SLC-mediated transmembrane transport HIV Infection