RMut: R package for a Boolean sensitivity analysis against various types of mutations

There have been many in silico studies based on a Boolean network model to investigate network sensitivity against gene or interaction mutations. However, there are no proper tools to examine the network sensitivity against many different types of mutations, including user-defined ones. To address this issue, we developed RMut, which is an R package to analyze the Boolean network-based sensitivity by efficiently employing not only many well-known node-based and edgetic mutations but also novel user-defined mutations. In addition, RMut can specify the mutation area and the duration time for more precise analysis. RMut can be used to analyze large-scale networks because it is implemented in a parallel algorithm using the OpenCL library. In the first case study, we observed that the real biological networks were most sensitive to overexpression/state-flip and edge-addition/-reverse mutations among node-based and edgetic mutations, respectively. In the second case study, we showed that edgetic mutations can predict drug-targets better than node-based mutations. Finally, we examined the network sensitivity to double edge-removal mutations and found an interesting synergistic effect. Taken together, these findings indicate that RMut is a flexible R package to efficiently analyze network sensitivity to various types of mutations. RMut is available at https://github.com/csclab/RMut.

RMut-package An R package for investigating different types of mutations in network dynamics analyses.

Description
The RMut package provides some categories of useful functions for examining dynamics of biological networks based on many kinds of mutations. The package also computes some node/edge-based structural characteristics of the networks.

Details
This package provides useful functions for intensely analyzing dynamics of biological networks and also random networks based on different kinds of mutations. For those analyses, the package utilizes a Boolean network model with synchronous updating scheme. The package also examines some node/edge-based structural characteristics of the networks. In summary, there are four main types of functions in the package, Setup functions, Data handling functions, Dynamics-related functions and Structure-related functions, as follows:

Setup functions
Core algorithms of dynamics analyses and feedback/feed-forward loops search were processed in parallel using an OpenCL parallel computing platform, which is an open-source library designed to run on any modern central processing units (CPUs) or graphics processing units (GPUs). Thus, the package can be used on any computer equipped with multi-core CPUs and/or GPUs that can support the OpenCL library. The showOpencl function shows installed OpenCL platforms and its corresponding CPU/GPU devices. setOpencl enables OpenCL computation by selecting a CPU/GPU device and utilizing all of its cores for further computation.

Data handling functions
Networks can be loaded in two ways using RMut: the loadNetwork function creates a network from a Tab-separated values text file. Otherwise, the package provides some example networks that could be simply loaded by data command. Furthermore, random networks of four generation models could be generated by using the createRBNs function.
Via output, all examined attributes of the networks will be exported to CSV files.

Dynamics-related functions
The findAttractors function identifies attractors in a network, and the resulted transition network could be exported by the function output. Then, those resulted files could be further loaded and analyzed by other softwares with powerful visualization functions like Cytoscape. In addition, the package also provides two other functions to manually investigate the network dynamics: perturb and restore. The perturb function makes perturbations on a set of node/edge groups in the examined network. restore puts a set of node/edge groups in the examined network back to its normal condition. Hence, we can manually compare the converged attractors before and after perturbations by utilizing those three functions.
The calSensitivity function computes sensitivity values of node/edge groups in the examined networks. Two kinds of sensitivity measures are computed: macro-distance and bitwise-distance sensitivity measures. We can use embedded mutations in the package or define our own mutations. Multiple sets of random Nested Canalyzing rules could be specified, and thus resulted in multiple sensitivity values for each group.

Description
The Arabidopsis morphogenesis regulatory network (AMRN) with 10 nodes and 22 links. This regulatory network is known to robustly control the process of flower development.

data(amrn)
Format A data frame with 22 rows and 3 variables: Source the identifier of the source node Interaction interaction type of the edge Target the identifier of the target node ...

Description
Calculate centrality measures for all nodes/edges in a network or in a set of networks.

calCentrality(networks)
Arguments networks A network or a set of networks used for the calculation

Details
This function calculates node-/edge-based centralities of a specific network or a set of networks. For each network, the returned results are stored in a data frame of the network object. The data frame has one column for nodes/edges identifiers, and nine columns contain corresponding values of some node-based centrality measures such as Degree, In-/Out-Degree, Closeness, Betweenness, Stress and Eigenvector, and some edge-based measures like Edge Degree and Edge Betweenness.

Value
The updated network objects including values of centrality measures for each node/edge.

Description
Computes sensitivity values of node/edge groups in a network or in a set of networks, and returns the network objects with newly calculated results.

Arguments networks
A network or a set of networks used for the calculation stateSet The identifier for accessing a set of initial-states mutateMethod The method of mutation to be performed, default is "rule flip" groupSet The indexing number of node/edge groups for whose sensitivity values are calculated. Default is 0 which specify the latest generated groups. The path points to a file containing user-defined Nested Canalyzing Function rules, default is NULL. In case the path is specified, the parameter numRuleSets is forced to 1.

Details
This function computes sensitivity values of node/edge groups in a specific network or in a set of networks. Two kinds of sensitivity measures are computed: macro-distance and bitwise-distance sensitivity measures.
The calculation is based on a set of initial-states specified by an identifier stateSet. The node/edge groups in each network are determined by an indexing number groupSet. For example, the number 1 would point to the data frame of node/edge groups named Group_1. For mutation settings, there exist some embedded mutations: "state flip", "rule flip", "outcome shuffle", "knockout", "overexpression", "edge removal", "edge attenuation", "edge addition", "edge sign switch", and "edge reverse". Besides, users can define their own mutation and apply here as shown in the below example. Users can also set the operational time of the mutation as determined by the parameter mutationTime. Finally, synchronous updating scheme is used for calculating state transitions. Single or multiple sets of random update-rules are generated based on the parameter numRuleSets. A file containing user-defined rules could be specified by the parameter ruleFile.
For each network, the sensitivity values are stored in the same data frame of node/edge groups.
The data frame has one column for group identifiers (lists of nodes/edges), and some next columns containing their sensitivity values according to each set of random update-rules.

Value
The updated network objects including sensitivity values of the examined node/edge groups. cchs Cell cycle pathway of the species Homo sapiens.

Description
The cell cycle pathway of the species Homo sapiens (CCHS) with 161 nodes and 223 links. The cell cycle is the series of events that takes place in a cell leading to its division and duplication (replication).

data(cchs)
Format A data frame with 223 rows and 3 variables: Source the identifier of the source node Interaction interaction type of the edge Target the identifier of the target node ...

Description
The cell differentiation regulatory network (CDRN) with 9 nodes and 15 links. CDRN has seven positive and two negative FBLs is found to robustly induce quiescence, terminal differentiation, and apoptosis.

data(cdrn)
Format A data frame with 15 rows and 3 variables: Source the identifier of the source node Interaction interaction type of the edge Target the identifier of the target node ...

Details
This function generates a set of random networks using a generation model from among four models: Barabasi-Albert (BA) model [1], Erdos-Renyi (ER) variant model [2] and two shuffling models (Shuffle 1 and Shuffle 2) [3]. Refer to the literature in the References section for more details.

Value
The generated random network objects.  The identifier for accessing a set of initial-states

Details
This function searches attractors of a specific network, and the returned results are stored in a Transition network object. The calculation is based on a set of initial-states specified by an identifier stateSet. The current set of update-rules of the network is used with a synchronous updating scheme.

Value
The resulted transition network.

Details
This function searches feedback loops (FBLs) in a specific network or in a set of networks. For each network, the returned results are stored in two corresponding data frames of node/edge attributes. Each data frame has one column for node/edge identifiers, and three columns contain corresponding number of involved FBLs and number of involved positive/negative FBLs. Another data frame of the network object, "network" data frame, contains total number of FBLs and total number of positive/negative FBLs in the network.

Value
The updated network objects including number of FBLs for each node/edge.

Description
Searches and counts feedforward loops for all nodes/edges in a network or in a set of networks.

Details
This function searches feedforward loops (FFLs) in a specific network or in a set of networks. For each network, the returned results are stored in two corresponding data frames of node/edge attributes. Each data frame has one column for node/edge identifiers, and four columns contain corresponding number of all FFL motifs and number of FFL motifs with three different roles A, B and C. Another data frame of the network object, "network" data frame, contains total numbers of FFLs and coherent/incoherent FFLs in the network.

Value
The updated network objects including number of FFL motifs for each node/edge.

Description
Generate a specific group of nodes/edges in a network.

Arguments network
The network contains the generated group nodes A list of nodes in the generated group edges A list of edges in the generated group

Details
This function generates a specific group of elements in a network. The group would be used to analyze the dynamics of the examined network, for ex., calculating sensitivity, perturbing the network, or restoring the network to the origin. The element group contains only nodes, only edges, or a combination of nodes/edges.

Description
Generate random groups of node/edge in a network or in a set of networks.
Usage generateGroups(networks, numGroups, nodeSize = 1, edgeSize = 0, newEdges = FALSE) Arguments networks A network or a set of networks contain the generated groups numGroups Number of random groups to be generated for each network. If set to "all", all possible groups would be generated.

Details
This function generates random groups of elements in a network or in a set of networks. The groups would be used to analyze the dynamics of the examined networks, for ex., calculating sensitivity, perturbing a network, or restoring a network to the origin. Each element group contains only nodes, only edges, or a combination of nodes/edges.

Value
The updated network objects including generated groups See Also calSensitivity, perturb, restore

Details
This function generates a default set of update-rules for a network. The rules would be used to analyze the dynamics of the examined network, for ex., calculating sensitivity, searching attractors. The type of random update-rules can be specified by the parameter ruleType: 0 means only Conjunction rules, 1 means only Disjunction rules and 2 means random Nested Canalyzing rules.

Value
The string "ok" if success, otherwise NULL object See Also

Description
Generate a specific initial-state for a network.

Arguments network
The network used for the generation state A binary string with one entry for each node of the network in alphabetical order

Details
This function generates a specific initial-state for a network. The initial-state would be used to analyze the dynamics of the examined network, for ex., calculating sensitivity or searching attractors.

Value
An identifier for accessing the generated initial-state. The identifier would be used as a parameter of the functions of calculating sensitivity and finding attractors.

Description
Generate random initial-states for a network or a set of networks.

generateStates(numNodes, numStates)
Arguments numNodes Number of nodes in each initial-state or a network object numStates Number of random initial-states to be generated. If set to "all", all possible initial-states would be generated. For the large networks, we should use a specific value becaused of memory limitation.

Details
This function generates random initial-states for a network or a set of networks. The initial-states would be used to analyze the dynamics of the examined networks, for ex., calculating sensitivity or searching attractors.

Value
An identifier for accessing the generated initial-states. The identifier would be used as a parameter of the functions of calculating sensitivity and finding attractors.

Description
The large-scale human signaling network (HSN) with 1192 nodes and 3102 links. Based on the network, some general principles were provided for understanding protein evolution in the context of signaling networks.

data(hsn)
Format A data frame with 3102 rows and 3 variables: Source the identifier of the source node Interaction interaction type of the edge Target the identifier of the target node ... Description initJVM initializes the Java Virtual Machine (JVM). This function must be called before any RMut functions can be used.

Details
This function initializes the JVM with a parameter of the maximum Java heap size maxHeapSize. The parameter is a string composed of a number and followed by a letter K, or M, or G (K indicates kilobytes, M indicates megabytes, G indicates gigabytes).
Value TRUE denotes successful initialization, and FALSE indicates failure.
Description loadNetwork loads a network from a file and returns the network object.

Arguments pathToFile
The path points to a file

Details
This function loads a network from a Tab-separated values text file and returns the network object. The file format contains three columns: source, interaction type, and target. "Source" and "target" are gene/protein identifiers that are used to define nodes, while "interaction type" labels the edges connecting each pair of nodes. The returned network object contains the network name, three data frames used for storing the nodes/edges and network attributes, respectively.

Value
The network object Description output writes all node/edge/network attributes of a network or a set of networks into CSV files.

Arguments networks
The network or the set of networks

Details
This function writes all node/edge/network attributes of a network or a set of networks into CSV files. For each network, the function exports all data frames of the network object containing structural attributes of the nodes/edges/network and sensitivity values of mutated groups.
The CSV files were outputed with names as follows: [network name]_out_[data-frame name].csv. The structure of these networks are also exported as Tab-separated values text files (.SIF extension).

Description
Perturb a set of node/edge groups in a network.

Arguments network
The network contains the node/edge groups groupSet The indexing number of node/edge groups in the network mutateMethod The method of mutation to be performed, default is "rule flip"

Details
This function perturbs a set of node/edge groups in a network. Two parameters groupSet, and mutateMethod have same meaning as in the calSensitivity function.

Value
None. Error messages or information would be outputed to the screen.
See Also restore, generateGroups, generateGroup, calSensitivity, findAttractors The indexing number of node/edge groups for whose sensitivity values are calculated. Default is 0 which specify the latest generated groups.

Details
This function prints out the sensitivity values of node/edge groups in a specific network. And the parameter groupSet has same meaning as in the calSensitivity function.

None
See Also restore Restore a set of node/edge groups.

Description
Restore a set of node/edge groups in a network.

Arguments network
The network contains the node/edge groups groupSet The indexing number of node/edge groups in the network

Details
This function restores a set of node/edge groups in a network to its normal condition. And the parameter groupSet has same meaning as in the calSensitivity function.

Value
None. Error messages or information would be outputed to the screen.

Details
This function enables OpenCL computation by selecting a CPU/GPU device and utilizing all of its cores for further computation. Thus, all tasks will be executed in parallel. About the parameter deviceType, there exists three options: \'none\' means disable OpenCL, \'cpu\' means selecting a CPU device and \'gpu\' means using a GPU device.

Value
Information of the successfully selected device
Description showOpencl gets OpenCL information and prints them to the console screen.

Details
This function gets OpenCL information and prints them to the console screen. For ex., installed OpenCL platforms and its corresponding CPU/GPU devices.

A string of OpenCL information
See Also