Graphia: A platform for the graph-based visualisation and analysis of high dimensional data

doi:10.1371/journal.pcbi.1010310

Table 1.

Summary of network visualisation tools commonly used for the analysis of biological data.

More »

Expand

Fig 1.

Graphia user interface displaying a correlation graph.

(A) Graph display area, showing correlation graph with a cluster selected (unselected nodes faded). (A1) Display context menu options (right click). (B) Node (row) attribute display area. (B1) Table of selected nodes and their attributes, imported and calculated within the tool. (B2) Data plot area, in the case shown here a mean histogram of selected node values. (B3) Visualisation of column annotations. (B4) Data plot context menu options for changing plot (right click). (1a) Add Transform button, (1b) Active transforms, (2a) Add visualisation button, (2b) Active visualisations, (3) General toolbar, (4) Attribute parameter selection, (5) Display of graph metrics (number of nodes, edges, components), (6) Plot/table function toolbar.

More »

Expand

Fig 2.

Different graph visualisation options.

(A) 3D perspective view, smooth shading (the default), with visualisation of node categorical attribute (MCL cluster). (B) 3D orthographic view, flat shading (no perception of distance—all nodes same size, unless sized by attribute value). (C) 3D perspective view, smooth shading. (D) 2D view, smooth shading. (E) 2D view, flat shading. (F) compressed 2D layout, flat shading, showing node overlap view. Visualisation of (G) Betweeness centrality values, (H) Eccentricity values, (I) PageRank values. G-I are continuous (numerical) attributes, so a colour spectrum and size gradient is used for node display (2D, smooth shading). Betweenness and eccentricity are calculated for both nodes and edges, therefore visual encoding is applied to both.

More »

Expand

Fig 3.

Visualisation of taxonomic trees.

(A) A taxonomic tree of all mammals was downloaded from the NCBI’s Taxonomy database, with nodes coloured according to type, i.e. subspecies (blue), species (pink), genus (orange), etc. The graph comprised of 9,843 nodes and 9,862 edges and is shown with a 2D layout. (B) Zoomed-in view of the area in square shown in A, with a single node selected (Western gorilla). (C) Right click on a node provides the ability to search the web for the node identifier either via Google or through a predefined database selected in the Network tab of Options dialogue. (D) Taxonomic tree of all insects from the NCBI’s Taxonomy database, nodes coloured by Louvain cluster. The graph consists of 275,328 nodes and 275,528 edges and in this respect represents a large graph where visualisation in 2D is challenging.

More »

Expand

Fig 4.

Analysis of cell and gene associations in scRNA-Seq data.

The structure of scRNA-Seq data is commonly represented using approaches such as (A) t-SNE and (B) UMAP as shown here for immune cells derived from the Tabula Muris dataset. However, the distance between data points and groups of data points is difficult to interpret. (C) Graphia enables the construction of cell-to-cell networks built on a similarity parameter. Here, the 48 most significant PCA values for each cell were first calculated and this PCA profile used to construct a correlation network. The plot bottom left of C, shows the PCA profiles of cells in the two largest cell clusters. To better show graph structure, a k-NN (k = 10) transformation was applied and outlier cells removed (r < 0.85 and node degree < 10, nodes coloured white). The graph comprises of 12,498 nodes (cells) and 143k edges. Cell clusters have been annotated as the cell types defined by the authors. (D) Shows a gene correlation network generated from these data by first calculating the average expression of genes within cell clusters and then calculating a correlation matrix from these values. (E) Plots show the average expression profile (y-axis) of a selection of gene-clusters across the aggregated cell-clusters (x-axis). The label gives the cluster number, e.g., C1, the number of genes within the cluster (966) and the association of the genes with a given biology or cell type.

More »

Expand

Fig 5.

Visualisation of the pangenome of Staphylococcus aureus.

(A) The full pangenome of 778 isolates. Nodes represent individual orthologous genes as identified by PIRATE. Node size is determined by the number of genomes in which a gene has been identified. Edges denote where two genes are syntenic, and their thickness is determined by the number of times this syntenic connection is observed across isolates. Syntenic stretches of core genes have been collapsed for clarity using the “Contract Edges” transform, and low confidence nodes and edges (n < 3) have been removed. Coloured by Weighted Louvain Clustering (granularity = 1.0). (B) Highly variable region (boxed area in A) with a high density of “phage-like” genes. Nodes and edges are sized and coloured by frequency. (C) Genes highlighted are all found in single S. aureus isolate, RF122. (D) the agrABCD locus coloured by gene-association clustering. Frequently, an alternative allele is not identified as being the same gene, but their position is strongly indicative of shared function. (E) the same locus coloured by gene identity.

More »

Expand

Fig 6.

Analysis of single nucleotide genome variant data.

(A) The graph shown was constructed from data from the 1000 genomes project based on the correlation (r threshold = 0.238) between the allele dosages at 23,675 SNVs from chromosome 22. Nodes represent the 2,504 individuals included in the study and edges the three most significant correlations with their neighbours (k-NN was applied where k = 3). In most cases, individuals’ group with others from the same continent although there are instances where this does not appear to be the case. Visualisation of edge weights (Ai) also highlights cases where individuals would appear to be closely related. (B) Colouring of nodes by the attribute ‘population’ provides a higher resolution to the graph and populations showing a high degree homogeneity have been labelled. (C) Transposing the data upon import demonstrates SNVs whose pattern across the genome covaries. Clustering of these data shows many to represent haplotype blocks and inspection of their profile across genomes, demonstrates some SNV clusters to be associated with a given ethnic grouping, e.g. cluster 3 (Africans) and cluster 14 (East Asia), whilst others little obvious association with ethnicity, e.g. cluster 6. Plots show the average score of SNV’s within a cluster (y-axis, 0,1,2), across the 2,504 individuals ordered by continent and then population (x-axis).

More »

Expand