APAV: An advanced pangenome analysis and visualization toolkit

doi:10.1371/journal.pcbi.1013288

Fig 1.

Overview of APAV pipeline.

(A) Workflow for PAV calling and consequent analysis. The coordinates of the target region are first extracted from a GFF file or BED file. The coverage of both the whole region and the element region is calculated based on the BAM files. The PAVs are determined according to their coverage, and interactive reports are generated. Subsequent analyses can be performed using the PAV tables, including genome size estimation, classification and statistical analysis, phenotypic association analysis, and visualization of elements. (B) Interactive analysis reports. The PAV report presents PAV tables, coverage data, pangenome sequences, genome annotation, and sequence alignments. The sample report presents sample tables, phenotype information, and real-time PAV analysis results.

More »

Expand

Fig 2.

Visualization of pangenome analysis results in rice genomes.

(A) Proportion and distribution of core, softcore, distributed, and private genes. The proportion is shown in the pie chart and the distribution of gene numbers is shown in the bar plot. (B) Distribution of gene counts across accession groups. Each point in the figure represents a sample. (C) Pangenome size estimation. The pangenome and core-genome sizes are drawn for different accession groups of rice samples. (D) The heatmap of the PAV profile. The annotations above and to the left indicate the total number of genes. Each row of the heatmap is for a sample and each column is for a gene. (E) PCA results of the distributed gene PAV. Each point in the figure represents a sample. (F) Manhattan plot of phenotype association analyses for plant height. The five points with the highest p-values are marked with red dots. (G) An example illustrating the correlation between phenotypic groups and the gene PAV. (H) An example displaying phenotypic differences between presence and absence group samples.

More »

Expand

Fig 3.

Element-level analysis for Os04g0373400 gene in the rice genome.

(A) Coverage distribution of the gene and all CDSs. (B) The number of samples with gene/CDS absence and presence within each rice group. Significant associations are observed between the rice groups and CDS1, CDS3, and CDS4 (all p < 0.001, Fisher’s test). (C-E) The heatmap for element level PAVs in terms of PAV(C), coverage (D), and sequence alignment depth (E) of elements in the Os04g0373400 gene. The absent regions of corresponding samples are highlighted with red dashed boxes.

More »

Expand