Table 1.
Mapping between biology and computer science terminology used in this paper.
Figure 1.
Taxonomy of PDS components and reactions.
A) In the taxonomy of PDS components there are four representation levels. The highest level (level 0) is the most abstract level, while the lowest one (level 3) represents single molecules. B) In the taxonomy of PDS reactions individual reactions are represented at the lowest level (level 2) and are grouped according to their functionality into three groups at level 1: Activation (A), Binding (B) and Inhibition (I).
Figure 2.
There are three groups of reactions. A) Activation (A) denotes all the reactions directly involving two components X and Y in the production of Z, where the concentration Z depends on the concentration of both substrates. B) Binding (B) results in the formation of a protein-protein complex or in the binding of a protein to a DNA promoter region to regulate its gene expression. C) Inhibition (I) is a process in which one component blocks the performance of another component.
Table 2.
Summary of all component types of the manually constructed SA, JA and ET sub-models represented at level 2 of the PDS taxonomy of Figure 1A.
Table 3.
Summary of all reaction types of the manually constructed SA, JA and ET sub-models, including the crosstalk connections, represented at level 1 of the PDS taxonomy of Figure 1B.
Figure 3.
Manually constructed PDS model topology visualised as an edge-labelled graph.
This graph, consisting of 175 nodes and 387 edges, is provided in Supporting Information S2 as an interactive graph visualised with the Biomine graph visualisation engine, enabling its closer inspection by zooming into its subparts and rearranging the node and the arc positions in the 2D space. The graph is organised into SA, JA and ET pathways with their crosstalk connections. The node borders of the main pathway components SA, JA and ET are coloured with red.
Figure 4.
Principle of decomposing families of components by decoupling of reactions.
The example shown in this figure presents two conversion types illustrating the transformation from the biological reaction representation into the edge-labelled graph representation. First, the linolenic acid node is connected to the reaction product 13-HPT directly with an arc labelled as A. Second, the decomposing of LOX node is done from the protein family level (level 2) to the single protein level (level 3). The final result of the conversion is a graph with 8 nodes and 7 edges.
Figure 5.
Overview of the Bio3graph methodology, its implementation and a sample output.
A) Schematic representation of the Bio3graph methodology. Text processing is performed in a workflow according to the boxes in the schematic diagram resulting in a network of (component1, reaction, component2) triplets. B) Bio3graph as a workflow implemented in Orange4WS. C) The triplet network extracted and composed by Bio3graph. The output network (consisting of 129 components and 1,132 reactions) is visualised with the Biomine visualizer and made available in Supporting Information S5.
Table 4.
Recall and precision analysis for 50 full-length papers.
Figure 6.
New direct PDS relations extracted from the biological literature.
The new direct links result from the Bio3graph processing of 9,586 articles. Bio3graph extracted 14 new direct relations between the components which were not identified in the manually built PDS model topology. Note that two of these triplets are trivial (SAG_metabolite, activates, SA_metabolite) and (NIMIN1_protein, inhibits, NPR1_protein).
Table 5.
Summary of PDS related triplets extracted by the Bio3graph triplet extraction algorithm from 9,586 PubMed Central articles.
Figure 7.
The final PDS model topology constructed by merging the manual and the Bio3graph networks.
A) Edge-labelled graph representing the merged model. B) The Venn diagram. The relations in the manual model are all direct and are coloured in red. The intersection between the model relations and the correct triplets extracted from the literature is presented with black colour. From the correct new triplets, the indirect relations are represented with green and the direct ones with blue colour. C) Zoom-in into a part of the merged PDS topology. The links from the manual model are shown in red, while the green coloured relations represent the extracted new indirect links, blue arcs show new direct links and the black arcs show the intersection between the manual model and the correct triplets extracted with Bio3graph.
Figure 8.
The principles of conversion to the edge-labelled graph format.
A) Activation reaction (labelled A) reaction between two components is transformed into the graph with arcs between the reactant and the product node. B) Activation (labelled A) on a transcription level is a special type of activation, when Y induces the activation of gene X to produce protein X. In this case we omit the gene transcription level when transforming the level 2 topology to the edge-labelled graph. C) Binding (labelled B) relation between two reactant nodes X and Y is transformed into a B relation between the reactants and an additional relation produces (labelled as P) between the reactant and the product. The latter is introduced to represent the binding of proteins into complexes. Binding is a binary relation, consisting of a bidirectional edge; in graph visualisation, the arrows are omitted. D) Inhibition (labelled I) is the blocking of the activation or binding reaction between components by a third component X, resulting in reduced production of product Z.
Figure 9.
Illustration of the triplet extraction.
We show a part of the flow from input of POS tagging box from Figure 5 until output of triplet extraction box of the same figure. The input to the Genia POS tagger is previously pre-processed sentence. After the shallow parsing with Genia POS tagger, the algorithm performs the step 2. The final output from the triplet extraction part of Bio3graph approach is a triplet in the form (subject, predicate, object) which will be then transformed and visualised as an edge-labelled graph with the Biomine visualiser.