Modules of co-occurrence in the cyanobacterial pan-genome reveal functional associations between groups of ortholog genes

doi:10.1371/journal.pgen.1007239

Fig 1.

The cyanobacterial core and pan-genome.

(A) The distribution of CLOGs as a function of the number of assigned strains. (B) The size of the pan-genome estimated for an increasing number of strains. The blue line indicates the mean size of the pan-genome, error bars indicate the standard deviation of 10⁴ randomly sampled subsets of strains. The red line shows a least squares fit of the power law p ∼ N^g (Heaps’ law), with p denoting the size of pan-genome and N the number of genomes. The estimated exponent g = 0.62 indicates an open pan-genome. (C) The size of the cyanobacterial core-genome estimated for an increasing number of strains. The blue line indicates the mean size of the core-genome whereas error bars indicate the standard deviation of 10⁴ randomly sampled subsets of strains. The estimates of pan- and core-genome do not include genomes of E. coli and Cyanobacterium UCYN-A.

More »

Expand

Fig 2.

Network analysis of co-occurring CLOGs.

(A) Orthologous genes are identified using an all-against-all BLASTp comparison and are grouped into cluster of likely orthologous genes (CLOGs). CLOGs are classified into three sets: core CLOGs (present in all strains), shared CLOGs (present in several but not all strains) and unique CLOGs (present in a single strain). (B) The phylogenetic profile of each CLOG indicates the set of strains whose genome is annotated with genes corresponding to the CLOG. Pair-wise co-occurrence of CLOGs is identified using the similarity of phylogenetic profiles. CLOGs are grouped into modules of co-occurring CLOGs using a community-detection algorithm. (C) A network view on co-occurring CLOGs. We identify a total of 563 modules with 1930 CLOGs. Circular genome maps were constructed using the CiVi tool [60].

More »

Expand

Fig 3.

Genomic proximity of co-occurring CLOGs.

The average adjacency score (aAS) measures the co-localization of CLOGs grouped into co-occurring modules. (A) A histogram of the average adjacency score (aAS). The histogram shows a clear dichotomy between modules whose constituent CLOGs (and hence genes) are co-localized in all genomes (aAS ≈ 1) and modules whose genes are not co-localized (aAS ≈ 0). (B) A scatter plot between the similarity score, measuring the quality of co-occurrence, and the aAS. The plot indicates that there is a positive but weak correlation between the genomic proximity of the genes comprising a module (represented by the aAS) and the quality of co-occurrence. The straight line corresponds to a linear regression and serves as a guide to the eye. (C) A scatterplot between the number of CLOGs associated to module and the aAS. While larger modules tend to have a lower aAS, the aAS scores are relatively well distributed with respect to the number of CLOGs in a module. (D) A scatterplot between the number of strains associated to a a module and the aAS. The number aAS is again relatively well distributed with respect to number of participating strains. In both plots the straight line indicates a linear regression and serves as a guide to the eye.

More »

Expand

Fig 4.

Selected modules of co-occurring CLOGSs and their associated strains.

A black box indicates if a CLOG (y-axis) is associated with a specific strain (x-axis). The first column indicates the module number, the last column indicates the primary annotation of the respective CLOG. Shown is an excerpt of modules of co-occurring CLOGs.

More »

Expand