Reactome graph database: Efficient access to complex pathway data

doi:10.1371/journal.pcbi.1005968

Fig 1.

A simplified example where reactions only contain reactants and products represented by the class PhysicalEntity.

(a) In the relational use case, two junction tables are required to model these many-to-many relationships (b) SQL query used to retrieve input and output entities of a given reaction where two join operations are needed per junction table. (c) The same reaction modelled as a graph. The reaction (green node) contains named outgoing relationships to corresponding input and output entities (purple nodes). (d) The same query written in Cypher, in a shorter but more intuitive manner.

More »

Expand

Fig 2.

Representation of the content migration.

The example shows a Reaction class reduced to its inputs, outputs, catalyst and regulators. A model class instance is converted to a graph database node where (1) slots with primitive value types become node properties and (2) slots allocating instances of another class become relationships.

More »

Expand

Fig 3.

A schematic diagram of the new ecosystem.

The relational database is converted to a graph database via the batch importer that relies on the Domain Model. Spring Data Neo4j and AspectJ are two main pillars for the graph-core, which also rests on the Domain Model. Users access services or use tools that make direct use of the graph-core as a library that eliminates the code boilerplate for data retrieval and offers a data persistency mechanism. Finally, export tools take advantage of Cypher to generate flat mapping files.

More »

Expand

Fig 4.

Examples of frequent use cases that can be answered using Cypher queries.

a) Retrieving the participating molecules for “Interleukin-4 and 13 signalling” pathway. b) Retrieving the pathways in which CCR5 participates.

More »

Expand

Fig 5.

Comparison of the response and elapsed time for one user sequentially retrieving 5,000 reaction instances from the graph and relational databases (blue and orange respectively).

The graph database software ecosystem achieved a 93% average improvement in performance compared to that of the relational database.

More »

Expand

Fig 6.

Response time versus an increasing set of users simultaneously performing queries for 5,000 reaction instances.

Starting with one and scaling up to 20 concurrent users, the relational database performance drops while the graph database keeps a low response time and a good throughput as the number of active threads increases.

More »

Expand

Fig 7.

Throughput measured in transactions per second, versus the number of users concurrently performing queries for 5,000 reaction instances in Homo sapiens.

More »

Expand

Fig 8.

The Reactome graph database in numbers.

More »

Expand