The authors have declared that no competing interests exist.
Wrote the paper: ED OB IR BAA GDB CS. Designed software: ED OB IR. Developed components and features: ED OB IR BAA KIF BG GDB. Tested features: ED OB IR BAA KIF BG OSS GDB. Wrote User's Guide and examples: ED OB IR OB OSS.
A rapidly growing corpus of formal, computable pathway information can be used to answer important biological questions including finding non-trivial connections between cellular processes, identifying significantly altered portions of the cellular network in a disease state and building predictive models that can be used for precision medicine. Due to its complexity and fragmented nature, however, working with pathway data is still difficult. We present Paxtools, a Java library that contains algorithms, software components and converters for biological pathways represented in the standard BioPAX language. Paxtools allows scientists to focus on their scientific problem by removing technical barriers to access and analyse pathway information. Paxtools can run on any platform that has a Java Runtime Environment and was tested on most modern operating systems. Paxtools is open source and is available under the Lesser GNU public license (LGPL), which allows users to freely use the code in their software systems with a requirement for attribution. Source code for the current release (4.2.0) can be found in Software S1. A detailed manual for obtaining and using Paxtools can be found in Protocol S1. The latest sources and release bundles can be obtained from biopax.org/paxtools.
This is a
The total volume of computational pathway data mapped by biologists has entered a rapid growth phase. There are more than 100,000 biochemical reactions and more than a million molecular interactions in publicly available 325 on-line resources
Despite these promising studies, the use of computable pathway information in biology is still in its infancy. Pathway data sources were originally developed to use their internal pathway representation, resulting in a heterogeneous set of resources that are difficult to combine and use. There are two key technical challenges we need to address to streamline the use of this rapidly expanding information : (i) standardizing and integrating pathway data from multiple heterogeneous sources and (ii) developing methods and tools that can help scientists easily access, map and analyze pathway information.
BioPAX (Biological Pathway Exchange)
The second key challenge is to build methods, algorithms and software tools to work with pathway data to answer biological questions
At the core of Paxtools is a complete and consistent object-oriented implementation of the BioPAX specification. A BioPAX record includes pathways, interactions, reactions and their participants. These elements are implemented as Java beans that have methods to access and manipulate the properties as described in BioPAX specification. The Model class acts as a container that allows adding, removing and searching for elements.
BioPAX uses OWL semantics that are not automatically covered by object oriented languages such as Java. Properties can be symmetric, transitive or subtyped into other properties. For example, a protein binding relationship is symmetric, if A binds to B, the inverse is also true. Also, the “standard name” property is a subtype of “name” property, so updating the standard name of the protein should also update its list of names. Paxtools implements these additional semantics and automatically updates object fields.
In the BioPAX specification, properties are unidirectional for brevity. For example, the “participant” property links interactions to physical entities. Paxtools provides additional “inverse” links for key properties that allow efficient bidirectional navigation. These inverse links are also automatically updated when the forward link is updated.
Paxtools includes two input/output handlers for reading and writing BioPAX files. Jena based IO can handle most OWL encodings but can be relatively slow and demands a significant amount of memory for very large files. The StaX based “Simple” IO handler class can only read BioPAX formatted as RDF/XML and does not have the flexibility of the Jena handler but it can read hundreds of thousands of elements within minutes with a very small memory footprint.
On top of this foundation Paxtools implements methods, algorithms and tools to solve common tasks encountered while working with pathway data (
Paxtools facilitates working with pathway information by removing technical barriers and acts as a common development platform for other tools and algorithms.
The “Controller” package allows users to manipulate BioPAX models without actually hard coding property and class names, define XML Path Language-like queries and traverse the object graph efficiently. An example which implements an object inspector to display the properties of objects can be found in the user's guide (Text S1). This pattern considerably simplifies development of BioPAX exporters and other tools and makes it easier to extend and update them to support future changes in the BioPAX specification.
Paxtools offers an array of graph search algorithms that are specifically targeted towards answering common biological questions. For example a mutually exclusive relationship between the mutations observed in two genes in a set of drug-resistant tumors can be explained by a common downstream pathway. Similarly, an observed correlation in gene expression between two genes can be caused by a signaling path that connects one gene to the control of the expression of the other. Starting from a set of entities, users can find common upstream and downstream events, feedback loops, connecting paths and networks of interest (
These cross-talks are difficult to find manually without a graph search because of the large number of connections to/from AR and p53 and because of fragmentation of data across multiple pathway data sources.
Working with BioPAX often requires complex manipulations to merge models from different sources, extract subparts or reduce them to simpler formats (
Different rules should be used dependent on the biological question at hand. For example a State Change interaction is inferred when the rule detects phosphorylation of p53 by p38. This is a useful relationship that can be applied to protein signaling cascade analysis from proteomics data. “Sequential Catalysis” rule on the other hand links entities that are responsible for catalyzing subsequent reactions. This is a relationship that frequently occurs in metabolic pathways and is useful for metabolomic studies where changes in the concentration of substrates are observed.
Paxtools can convert Proteomics Standards Initiative - Molecular Interaction (PSI-MI)
The recently released BioPAX Level 3 introduced significant extensions to previous levels with some loss of backwards compatibility. Paxtools supports all three BioPAX levels (1, 2 and 3) and provides facilities for upgrading older BioPAX models to Level 3.
Each operation that modifies the model is internally validated by Paxtools to comply with BioPAX syntax, including RDF well-formedness, domain and range restrictions, bidirectional links, and redundancies. This is especially useful for tools that create BioPAX such as pathway editors, converters or exporters as it allows early detection of errors, increasing the quality of the produced BioPAX data. This syntactic validation does not include more detailed checking of semantics that is performed by the BioPAX validator.
Paxtools comes with mappings for Java Persistence API that allows out-of-the-box persistence of BioPAX models into relational database systems, such as MySQL. This is an especially important feature for working with large biopax models, such as complete pathway database exports.
Paxtools source code is currently distributed as a modular Maven project which allows developers to easily select just the parts of Paxtools they need in their application. The Paxtools core module, which provides a complete implementation of BioPAX and read in/write out functionality, is a very compact library (400 kB) that can be expanded as required with additional modules, such as those described above.
Paxtools implements all of these features in a lightweight, extensible structure. Paxtools can be embedded in a software tool as a library, can be used for programmatic access to pathway data with Java or Jython or can be used to run most commonly used Paxtools algorithms and searches from the command line. Here, we briefly review several applications where Paxtools was used to answer biological questions or develop bioinformatics tools.
When an experimental method identifies a set of genes associated with a phenotype, such as prostate cancer, cellular processes that connect these genes can offer a causal explanation. In this scenario, a user writes a small Java program that loads the cellular process network curated by a pathway database. Alternatively, instead of using BioPAX exports, the data can be queried dynamically from PC using the PC-client module. The program then runs a
In this use case, a set of high throughput profiles are used in conjunction with pathway information to find significantly altered modules. Due to the nature of the experimental data, detailed BioPAX information, such as the phosphorylation states of interacting proteins is unnecessary in this case. To facilitate analysis, complex BioPAX pathways can be reduced to a simple interaction network using the SIF converter that effectively merges multiple states of the entity into a single node. Alteration profiles can be directly mapped onto these “entity” nodes using external references to Entrez Gene and UniProt databases as defined in BioPAX models. Existing clustering and module-finding algorithms
Visualization is key for understanding complex relationships between biological entities. Paxtools significantly facilitates implementing Pathway visualizers and editors by handling the reading/writing and syntactic validation. Paxtools also provides additional common functionality for interactive tools such as undo/redo and object inspection. Finally, the Pathway Commons client enables pathway visualization tools to directly access and query Pathway Commons. Pathway Editors and visualizers that use Paxtools for these purposes include ChiBE
The effect of post-translational modifications on protein activity can be useful for assessing the function of mutations in the vicinity of a modification site. Specifically, we would like to know if a given post-translational modification causes a protein to gain or lose a function. We can search pathways for processes that modify a protein and predict how this change will affect the controlled downstream processes. The algorithm also needs to consider processes that contain the protein as a part of molecular complex or as a part of a protein family. Since these inclusion relationships and the type and location of the modifications are stated explicitly in BioPAX, equivalent protein modifications can be matched to integrate findings from different pathways. A complete implementation of this algorithm can be found in the Paxtools code.
Paxtools can run on any platform that has a Java Runtime Environment and was tested on most modern operating systems. Paxtools is open source and is available under the Lesser GNU public license (LGPL), which allows users to freely use the code in their software systems with a requirement for attribution. Source code for the current release (4.2.0) can be found in
A rapidly growing open-source BioPAX software infrastructure, available as part of the Pathway Commons project, is directly built on top of Paxtools, including a state-of-the-art persistence and full-text search system cPath2, an advanced validator that allows checking complex rules and best practices, a pathway alignment tool and a framework for quickly mapping experimental data on BioPAX pathways. Since these tools use a common API, it is possible to combine and re-use software components across multiple applications. Detailed examples of using Paxtools for real-life applications can be found in these projects. We encourage users to contribute to the codebase and expand it to build a pathway informatics platform that addresses the growing needs of the community.
One of our major future goals is to reach a wider community of researchers by implementing different language bindings and usage modes for Paxtools. We are planning to include bindings for the R statistical programming language and implement an interactive scripting mode for more efficient non-programmatic manipulation.
Paxtools can significantly reduce the time spent parsing, normalizing and querying BioPAX information. Without removing these barriers, it is difficult to tap into the rapidly growing computable pathway information corpus produced by multiple pathway databases and curation groups. Paxtools allows researchers to shift their effort from data plumbing to answering biological questions.
Paxtools User's Guide. This guide provides details on how to obtain and use Paxtools. It also provides examples and technical details for the key features as well as a frequently asked questions section.
(PDF)
The complete source code for all Paxtools modules as of the 4.2.0 release as a single bundle. The latest sources and release bundles can be obtained from biopax.org/paxtools.
(TAR)
We would like to thank Paxtools users for their valuable feedback.