One of the authors (JK) is currently employee at Roche Diagnostics GmbH. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.
Wrote the paper: AG OK HPL. Main Software Developer: AG. Software Developer: DF. Software Testing: DF PN OK HPL JK. Design and Concept: AG OK HPL MK JK.
Interactive visual analysis of biological high-throughput data in the context of the underlying networks is an essential task in modern biomedicine with applications ranging from metabolic engineering to personalized medicine. The complexity and heterogeneity of data sets require flexible software architectures for data analysis. Concise and easily readable graphical representation of data and interactive navigation of large data sets are essential in this context. We present BiNA - the Biological Network Analyzer - a flexible open-source software for analyzing and visualizing biological networks. Highly configurable visualization styles for regulatory and metabolic network data offer sophisticated drawings and intuitive navigation and exploration techniques using hierarchical graph concepts. The generic projection and analysis framework provides powerful functionalities for visual analyses of high-throughput omics data in the context of networks, in particular for the differential analysis and the analysis of time series data. A direct interface to an underlying data warehouse provides fast access to a wide range of semantically integrated biological network databases. A plugin system allows simple customization and integration of new analysis algorithms or visual representations. BiNA is available under the 3-clause BSD license at
Systems Biology aims to study the relationships and interactions between various parts of a biological system and to integrate this information in order to understand its functionality. In this process, the visualization and visual analysis of the network data plays an essential role in understanding complex biological processes and mechanisms. Many tools, which strongly differ in the way of presenting the data, have been developed for visually exploring biological networks
In addition to the static pathway maps, many tools that dynamically visualize biological networks have been developed in the last decade. Most of these tools have been designed to fulfill the requirements of particular applications.
The stand-alone visualizer Cytoscape
Another visualization tool that has been designed specifically for the integrative visual data mining of biological pathway is VisANT
VANTED
A problem with visualizing biological data is the size of the networks, which can be very large, e.g., for protein-protein interaction data. OSPREY
Since many users appreciate the classical layouts used in static biological maps like KEGG, some visualization tools try to dynamically generate similar layouts, which in addition have the advantage of being editable. KGML-ED
A review
A major challenge in visualizing biological networks is that the layouts have to account for additional constraints imposed by biological convention or additional data (e.g., subcellular location). The next step is the visualization of high-throughput data (proteomics, transcriptomics, metabolomics, etc.) in the context of these networks
The aspect of incorporation of external data sources is another important point also addressed by Suderman et al. in
Another major challenge is the improvement of the layouts since most tools do not take into account the underlying biology and drawing conventions known from biochemical textbooks but only the structural relations of the mathematical network. For specific applications, it is however necessary to extend and customize the general layouts to the special needs of the application. Most visualization tools focus on very specific aspects and cannot easily be extended.
Thus, sophisticated layouts for biological networks in general, but also tailored for special applications, need to be developed and combined with existing drawing concepts. Furthermore, the integration of future tasks and applications should be easy. The key challenges in the area of visual analytics of biological networks remain, thus, to combine automated analysis with advanced visualization techniques, and to make the navigation and exploration of huge data sets interactive.
A prior version of the software we present, has been published along with the BN++ data warehouse
BiNA is a visual analytics tool for the interactive visualization, exploration, and analysis of biological network data. It uses advanced graph drawing techniques for visualizing the network data. The integrated connection to an R server supports custom pre-analysis of high-throughput (omics) data, which can then be projected onto these networks and visualized and analyzed in a number of ways.
The core of BiNA is a sophisticated visualization concept, combining high-quality graphics, dynamic network visualizations, and very flexible data projections. BiNA provides different network representations and visual styles.
BiNA's KEGG visual style provides layouts of the pathways which are very similar to the KEGG maps. Additionally, BiNA supports filtering of organism-unspecific parts of a pathway, which improves the readability. In this figure, we manually removed disconnected reactions from BiNA's pathway. Furthermore, neighbored pathways can be directly explored and shown in the same visualization, which clearly supports the biological understanding of relationships across borders of canonical pathways (not shown).
In
The visualization of the KEGG Apoptosis pathway in a layered sub-cellular compartment model demonstrates BiNA possibilities for integrating cellular location information. For this, information, e.g., from SwissProt
Similar to other tools, BiNA is able to project high-throughput data from various sources to the network. For this, we implemented a generic projection framework, supporting a large number of projection targets. These targets range from simple graph attributes, e.g., color, node size, line thickness, and visibility to complex ones, like sub-cellular compartment association or time series. Time series projections, for example, are realized using small charts, which are drawn instead of node shapes.
The user can load omics data from various file formats, via drag and drop from a spreadsheet program, or online from the gene omnibus database (GEO)
High-throughput datasets can be projected onto a network by simple drag and drop operations. The upper right hand side of the view shows available datasets. When one of these datasets is dragged onto the main network visualization, possible network attributes for projection arise (green boxes). Afterwards, a dialog opens and permits a more detailed configuration of the projection (not shown).
The imported data can be modified directly in BiNA. It is possible to transform the data based on simple mathematical expression. Access to a number of mathematical functions and operators is provided through an intuitive syntax. User defined functions can be mapped to R expressions
The editor for connecting R expressions with functions in BiNA. The new normalize_vsn(x) function calls the underlying R statement normalizeVSN(x).
The mathematical expression editor (
For a more complete overview of BiNA's features and a comparison to other tools, we refer to
BiNA has been used in several application scenarios. For example, we are offering a web service called NetworkTrail
The results of NetworkTrail can be visualized using BiNA Webstart. The NetworkTrail plug-in of BiNA provides interactive navigation through the found subnetworks using the toolbox in the top-left corner of the visualization, which supports the evaluation of the results. It is easily possible to switch to a certain subnetwork of size k, or to hide the consensus network, which is the union of all found subnetworks. It is also possible to show or hide the score and the number of relations an edge represents. On the right-hand side, the user can adjust some basic visual mapping properties.
Other applications where we successfully used BiNA for visually analyzing the algorithmic results are miRtrail
In
BiNA acts as a visualizer for the BN++ data warehouse system, which semantically integrates several biological databases and stores them into BNDB. BiNA is able to access BNDB directly via SQL, either the MySQL or the Apache Derby version. BiNA consists of a number of plug-ins (OSGi bundles), which are packed together for distribution. Using these plug-ins BiNA can import various file formats, use an R server for processing experimental data, and visualize and analyze networks in different contexts. The user is able to extend the functionality of BiNA using the public API of the OSGi bundles.
BiNA itself is a modular software tool built upon OSGi
Each core bundle provides a API, which can be used by software developers to extend the functionality of BiNA. These bundles usually correspond to certain application scenarios, e.g., for using BiNA Webstart in an online workflow for visualizing results in a customized way. The developer documentation is available at
The underlying OSGi 4.3 platform provides dynamic loading and unloading of bundles during runtime, automatic handling of native libraries, and a full-featured component/bundle system, which becomes more and more necessary in large software projects. The main application window (GUI) of BiNA is divided into several dynamic views using the Docking Frames window management library
BiNA distinguishes between views and editors. While a view exists only once, editors can have multiple instances, e.g., for different visualizations.
The core of the visualization is a sophisticated visualization model. From a set of universal base components, different visualization styles are realized, e.g., for metabolic and regulatory networks. The latter ones include gene regulatory, signaling, and protein-protein interaction networks.
Base components of the visualization model like nodes, edges, groups, hyperedges, and stars combine the simplicity of graphs with the readability of hierarchically structured elements. During the visualization process these components will be translated into a hierarchic graph using the graph library interface of BiNA. An implementation of this interface is provided by the separate yFiles graph library
Groups of the visualization model are realized using hierarchic nodes and can be opened and collapsed again using mouse gestures. They can contain their own graph layout, which supports multiple layouts in one visualization view and can be used for, e.g., laying out two biological pathways in a different way nearby.
Every component has an additional graphical abstraction layer describing its visual appearance and available user interactions. This layer can be exchanged by a different visual style, for example, the KEGG visual style for the metabolic representation.
Since groups can be collapsed and single components can be made invisible, single changes to the model can result in a large number of changes to the resulting graph. We meet this, by only updating those parts of the graph, which have been marked as modified by the changes to the model. We further distinguish between topological, visual, and layout changes to avoid unnecessary calculations, which heavily accelerate the rendering process in many cases.
This is also the reason why we subdivide the rendering process into three steps: (1) the construction of the graph topology, (2) assignment of graphical attributes (e.g., color and shape) to each node and edge, and (3) laying out the graphical elements on the screen.
Data projection (see section below) can influence each of these steps. For example, the projection of omics data to node visibility can filter some nodes in the first step. The subdivision into these steps now improves the rendering speed, since not all nodes and edges have to be processed by step two and three, which becomes important for large networks.
Every component of the visualization model has a set of properties, e.g., label text, shape, size, text font, color, line width. These properties have default values that can be changed globally with a predefined visual style. These properties can, however, be overridden by the user at any time. The resulting graphical representation can thus be customized in any way the user desires.
The network visualizations can be saved using the BiNA specific file format (BML). The saved visualization contains also all data sets projected to the visualization and the visualization-wide properties.
As discussed above, BiNA is able to directly access biological networks from a BNDB
A generic projection framework supports the mapping of arbitrary external information, primarily omics data sets, onto the network data. As noted before, data projection can influence a large number of visualization attributes, like color, size, stroke, and more complex ones like reaction direction, or subcellular location. The projected data can be imported from suitable files (e.g., mzTab, comma separated values format (CSV/TSV), SOFT), dragged & dropped from text/spreadsheet editors, or downloaded from the GEO database
The data itself is organized in a tabular fashion. Data columns can be of different types. Rows correspond to individual components of the network (i.e., proteins, genes, metabolites) and are identified by a single identifier or a set of identifiers (typically database accession IDs). Columns can be grouped in order to express sample semantics (e.g., replicate structure). Identifiers will be mapped to the internal identifiers of the network nodes. If necessary, external identifier mapping services (UniProt) are employed to translate between different identifier systems.
BiNA will use the selected sample of the series for data projection. If multiple samples are selected, BiNA will automatically try to compute a temporary sample consisting of their mean values, which will then be used for projection. It is also possible to enter a custom formula using an intuitive syntax to calculate the temporary sample. The user can choose from a large set of built-in functions or provide new user-defined functions, which will be passed on to and evaluated by R.
In the latter case, BiNA must be connected via Java Native Interface (JNI) to an R installation or via network to a server running the Rserve package. The connection to R also enables loading of arbitrary matrix data from R into a series in BiNA and vice versa, such that existing workflows in R can be reused. Derived samples can be exported again together with the series, either in the BiNA data format (BDF) or in comma/tab separated values format (CSV). If BiNA is connected to R using JNI, the
Since, the network visualization of BiNA uses the concept of hierarchic graphs, it is not reasonable to apply standard graph algorithms on the resulting visualization graph, for example, a shortest paths search. Especially, the groupings of protein families, protein complexes, biochemical reactions, and pathways, and the multiple representation of entities used for visualization aspects, can conflict with graph theoretic interpretations. To meet this problem, the different network representations of BiNA provide a configurable way of flattening the visualization graph. For example, the hierarchical modeling of protein families can be flattened by splitting the node representing the family into its individual members. This feature can be used by developers to access the underlying biological network of BiNA. It is also used for exporting visualizations into simple graph formats (e.g., GraphML).
BiNA provides also a way to project the results, calculated on the simple graph, back to the visualization. We provide two example algorithms, a
The default visual style of the metabolic network representation provides drawings known from biochemical textbooks. For this, reaction arrows are drawn using quadratic curves and instead of node shapes, the chemical structure of metabolites can be shown by painting MOL files from the KEGG database. The resulting network is a bi-partite graph, containing reaction nodes (enzymes) and metabolite nodes. As noted before, reactions are realized using groups containing the co-factors of a reaction. Since the reaction groups do not have their own layout style, co-factors and their edges are integrated into the parental layout, if the group is opened.
The KEGG visual style is built upon the default one. It provides different properties for the visualization elements, and uses a special KEGG layout algorithm for pathways to imitate the KEGG maps. The layout information is provided in parts by the KGML files of KEGG. For example, a KGML file contains information about node positions, but lacks information about edge routes, thus, they must be computed by the layouter.
The metabolic representation provides functionality for improving the readability and clarity of the visualization, like context-specific hiding of co-factors and aliasing of compounds and reactions. Several strategies for selecting co-factors are provided to the user, e.g., by information from the source data, editable pre-defined lists, heuristics, or manually. Furthermore, the integration of context-specific pathway validation strategies (e.g., those presented in
The regulatory network representation visualizes various regulative interactions and complex assemblies of entities (proteins and metabolites). These can be organized in complexes and families, which are realized using groups. Similar to the metabolic representation, the regulatory one supports aliasing of entities. This improves the readability, if multiple pathways, which are also realized as groups, contain the same entity. Different instances of an entity can be highlighted and it is possible to merge instances again. Another feature is the exploration of the underlying data source. For this, the user can explore the neighborhood of an entity, either upstream, downstream, or both, by adding these regulative interactions to the visualization.
For rendering and layout algorithms, BiNA uses the yFiles for Java graph library. We provide the
In the metabolic representation, for example, the organic layouter is extended to support the readability by laying out dangling co-factors according to the reaction flow. For the regulatory representation, which supports changing the visualization background to visualize subcellular locations, we extended the hierarchic layouter of the yFiles graph library to support layer assignments in an easy way.
BiNA, the Biological Network Analyzer, is a mature tool for the integrative analysis of omics data in a network context. In contrast to other tools, it permits not only the direct import of networks and omics data from flat files, but also features a direct connection to an integrated data warehouse. By supporting common file formats for transcriptomics, metabolomics, and proteomics as well as arbitrary tabular data, nearly any data set of interest can be projected onto the networks and explored interactively. A generic projection system supports users in creating visually concise and information-rich visualizations that can be directly exported in publication quality. The interface to the statistical programming language R furthermore enables a wealth of existing statistical methods. These can be easily applied to the datasets loaded in BiNA and results of the statistical analysis can be easily projected back onto the network visualization.
The integration of advanced graph drawing techniques permits a pleasing graphical representation of the networks. Inclusion of well-established layouts (e.g., the KEGG layouts) provides easily recognizable pathways while adding the benefits of interactive navigation and simultaneous display of multiple pathways. Where these layouts do not suffice, the built-in editing capabilities permit extension, correction, and curation of the representation as well as the underlying data.
BiNA's standardized plugin interface permits developers to add custom extensions whenever needed. We expect that additional plugins will extend the functionality soon.
BiNA (version 2.4.1) is available free of charge under an open-source license (3-clause BSD license) with exception of the yFiles bundle, which is free but not open-source. Source code, documentation, and installable packages are available from the project web site at
(DOCX)
(DOCX)
Our thanks go to Mark Rurik, Johannes Veit, and Matthias Walzer for persistent and intensive software testing. We also thank all those, who contributed feature requests, bugs reports, and ideas during the development of BiNA.