Fig 1.
Pangenome composition at different levels and with different methods.
Table 1.
Genetic variations with interruptions in coding genes.
Fig 2.
Genetic variations in shell genes.
128 shell genes in the pangenome with information of presence and absence at gene level and protein level in each sample. Detailed information about these 128 shell genes could be found in S1 table. Low quality genes and non-Rv genes were not shown. The phylogeny tree on the left side was constructed in our previous study using 33,220 SNPs detected in the whole genome with reference to H37Rv using max likelihood method.[20] The genetic distance information was removed for better presentation of the topology of the tree. Bright red boxes represent genes completely (>90% of length) deleted in that genome; dark grey boxes represent genes presenting in the genome but were interrupted by SVs or high impact SNPs; blue boxes represent genes with valid ORFs; light grey represent genes with un-classified interruptions.
Fig 3.
Selective pressure, nucleotide diversity, and homoplasy in the pangenome (n
= 3,742). Only high quality Rv genes with > 3 CDSs of valid ORFs and at least one segregating nucleotide site are shown. Each circle represents a gene. x axis is Tajima’s D, and y axis is nucleotide diversity π. Count of homoplasy events are shown by the size of the circles. The colors show the different levels of significance of Tajima’s D. In the legend, the first number in parenthesis is the number of genes at that level of significance for Tajima’s D, and the second number shows the corresponding number of counts of homoplasy events as shown by corresponding circle size. The red line shows the average nucleotide diversity across the pangenome. The blue line shows the predicted π by linear regression model for genes above the average nucleotide diversity, and the purple line shows the predicted π below the average nucleotide diversity.
Fig 4.
Folded minor allele frequency spectrum for rpsL (a) and katG
(b).
Fig 5.
Phylogeny tree shows 404 genomes with valid katG ORFs. The phylogeny was modified from the same one used in Fig 2 but with genomes of in valid katG ORFs removed. Nucleotide site 947 corresponds to 944 in H37Rv due to a 3 bp insertion in some genomes upstream of this position. Strips show the sites with homoplasy with nucleotide in different colors: A = red, C = blue, G = grey, T = orange.
Table 2.
Population genetic characteristics of genes following different evolutionary patterns.
Table 3.
Over-represented gene categories.
Fig 6.
Network of Enriched Ontologies.
a) Enriched ontologies in the group of genes under high selective pressures. Only genes enriched in actinobacterium-type cell wall biogenesis (GO:0071766) and related ontologies were shown. b) Enriched ontologies in the group of genes showing high homoplasy levels. Only genes enriched in disruption of anatomical structure in another organism (GO:0141060) and related ontologies were shown. Enriched ontology nodes were in red with gene nodes in grey.