Skip to main content
Advertisement

< Back to Article

Fig 1.

A rough schematic of a typical computational analysis pipeline in a malaria genetic epidemiology study, whereby the generation of a genetic distance or similarity matrix is a key step in the data analysis.

The main analysis pipeline is shown with the dark arrows. Additional sensitivity analyses feed into the final interpretation and translation of results and are shown by the light arrows. Bullet points show examples of data modalities, data processing algorithms or analytical approaches. *fineSTRUCTURE is itself a computational pipeline that takes as input phased haplotypes (e.g. a phased variant matrix), computes a co-ancestry matrix and then performs clustering on this matrix. fineSTRUCTURE clustering corresponds to the second stage of the fineSTRUCTURE pipeline.

More »

Fig 1 Expand

Fig 2.

Comparison of three genetic distances.

Each panel represents distances between 77028 pairs of 393 P. falciparum isolates from the Eastern GMS. A: 1−IBS distance; B: 1−IBD distance; C: −log2 IBD distance; D: agreement between 1−IBS and 1−IBD. Note that the y-axes in panels A-C are on a log10 scale. The set of distances such that −log2 IBD is approximately 12.5 are those with estimated IBD equal to zero, replaced by a lower limit of quantification equal to the smallest IBD value greater than zero. Note that 1−IBD spans the zero to one range, whereas 1−IBS does not.

More »

Fig 2 Expand

Fig 3.

PCoA and PCA of summary n-by-n distance and similarity matrices for n = 393 isolates.

Panels A-C show PCoA applied to the 1−IBS (A), 1−IBD (B), and −log2 IBD (C) distance matrices. Panel D shows PCA applied to the co-ancestry matrix computed using fineSTRUCTURE version 4. Isolates are plotted along the first two principal components. Colours correspond to the different known causative mutations in the Pfkelch13 gene, where green is wild type (WT) and blue is C580Y. Triangles correspond to Pfplasmepsin amplified parasites, and circles correspond to parasites that are WT in Pfplasmepsin.

More »

Fig 3 Expand

Fig 4.

Two distinct dendrograms which depict the same underlying clustering arrangement.

The ordering of the leaves was changed by randomly rotating the internal nodes. The clustering arrangement was produced by applying HAC with average linkage to the -log2 IBD distance matrix. The coloured bars below the dendrograms visualise the corresponding Pfkelch13 mutation of each isolate (green: wild type; blue: C580Y). This illustrates how the ordering of the meta-data is sensitive to arbitrary choices for the dendrogram topology.

More »

Fig 4 Expand

Fig 5.

Tracking the discrete cluster assignments derived from HAC (specification is average linkage) according to the genetic distance matrix used.

Each P. falciparum isolate was assigned a colour based on their cluster assignment according to a flattened dendrogram (a dendrogram cut at a given y-axis point) of a clustering arrangement generated by HAC of the 1-IBD distance matrix. In this case the y-axis cut-point was chosen to produce nine distinct clusters (panel A). These colours were then used to track cluster membership when the same HAC algorithm is applied to the 1-IBS and −log2 IBD distance matrices (panels B and C, respectively).

More »

Fig 5 Expand

Fig 6.

Tracking the discrete cluster assignment derived from HAC according to the linkage function used applied to the −log2 IBD distance matrix.

Each P. falciparum isolate is assigned a colour based on their cluster assignment from the average linkage algorithm whereby the y-axis cut-point was chosen to produce nine distinct clusters (panel A). Average linkage was arbitrarily chosen as the ‘reference’ method (any of four linkage functions could be used). These colours are then used to produce stacked barplots for cluster membership derived from three other algorithm specifications (complete, single and Ward’s criterion, panels B-D, respectively).

More »

Fig 6 Expand

Fig 7.

Heatmaps of genetic distance matrices, whereby isolates have been ordered with the output of the HAC algorithm (average linkage specified here).

The colour shading was chosen by applying nine shades of purple to a uniform grid over the range of observed values (see panels A-C of Fig 2) in the distance matrix. The visual effect of the clustering in the heatmap is sensitive to this specification, for example, a grid over observed quantiles would produce a different visual effect.

More »

Fig 7 Expand