Table 1.
Expansions of gene families.
Figure 1.
Histogram of the normalized size of gene families.
For each family we compute the number of genes in the family and subtract it by the number of genomes containing at least one member of the family.
Figure 2.
Relative contribution of horizontal gene transfer in protein family expansions.
Table 2.
Different (Δ) ages of IGD and HGT per clade.
Figure 3.
Abundance of IS and prophages and increased inference of IGD events when included in analysis.
The bar plot (left y-axis) shows the percentage of gene family expansions of IS and phage origin. The line plot (right y-axis) indicates the increase of the number of expansions assigned to duplications when the co-localization criterion is ignored and IS and prophages are included in the dataset.
Figure 4.
Gene expression differs according to gene origin.
Paralogs are more expressed, as measured by the codon adaptation index, than xenologs. Xenologs, however, are more expressed than the genes without paralogs and xenologs.
Figure 5.
Evolutionary rates differ between paralogs and xenologs.
Non-synonymous (dN) and synonymous (dS) substitution rates in paralogs (blue; dashed linear fit) and xenologs (red; solid linear fit) in all clades computed using Codeml from PAML [76] (model = 1, fix_omega = 0).
Figure 6.
Protein family construction pipeline.
Starting with a databank of proteins, we first performed all pairwise similarity searches using BLASTP. The hits were filtered regarding the length of the match (70% of the length of the query) and the bitscore (30% of the maximal bitscore calculated by aligning a protein against itself). To build the gene families we ran MCL blastline and then removed all singletons, IS and Phage. To build the core genome we used OrthoMCL along with a synteny filter based on M-GCAT Clusters. Finally, using presence/absence and phylogenetic information, we obtained the protein families with expansions
Figure 7.
Cumulative distribution function plot of protein similarity.
Colored lines correspond to CDF plots of the similarity between orthologous proteins of the core genome for the comparison of E. coli K12 W3110 with genomes of increasing phylogenetic distances. The gray line corresponds to the similarity between homologous genes in the E. coli K12 W3110 genome.