Figure 1.
Venn diagram depicting 15 intersections for the four rickettsial groups.
Classification scheme based on molecular phylogeny estimation [28], the topology of which is shown in the lower left; AG = ancestral group, TG = typhus group, TRG = transitional group, SFG = spotted fever group. Genome codes are as follows: Br = R. bellii str. RML369-C, Bo = R. bellii str. OSU 85 389, Ca = R. canadensis str. McKiel, Pr = R. prowazekii str. Madrid E, Ty = R. typhi str. Wilmington, Ak = R. akari str. Hartford, Fe = R. felis str. URRWXCal2, Ri = R. rickettsii str. Sheila Smith CWPP, Co = R. conorii str. Malish 7, and Si = R. sibirica str. 246. Arthropod hosts are illustrated for each genome, and strains known to harbor plasmids are depicted.
Figure 2.
Alignment of 10 rickettsial genomes.
Taxa are in the same position as in estimated trees in Figure 3, with taxon abbreviations explained in the Figure 1 legend. Alignment created using Mauve [189] after reindexing the R. sibirica genome (see text for details).
Figure 3.
Estimated phylogenies of ten rickettsial taxa based on 731 representative core proteins.
(A) Tree from Bayesian analysis. Three MCMC chains were primed with a neighbor-joining tree and run independently for 25000 generations in model-jumping mode. Burn-in was attained by 2500 generations for all chains, and a single tree topology with exclusive use of the Jones substitution model was observed in post burn-in data. The consensus tree shown here thus has 100% support for every branch. Branch support is from the distribution of posterior probabilities from all trees minus the burn-in. (B) Tree from exhaustive search using parsimony. Branch support is from one million bootstrap replicates.
Figure 4.
Illustration of representative and non-representative OGs and their categorization into Class 1 and Class 2 OGs.
Taxon abbreviations are explained in the Figure 1 legend. Dark circles depict gene presence, while open circles depict gene absence. (A) Representative OGs: orthologous groups with only one ORF per included genome. Our analysis includes ten rickettsial genomes, thus representative OGs only include from 2–10 ORFs. Four examples are shown. (B) Non-representative OGs: orthologous groups with multiple ORFs from at least one included genome, comprised of either recent (orthologs) or distant (paralogs) gene duplications (dupl). False singleton OGs are comprised of only one taxon, but with multiple ORFs from that taxon (example on right). Four examples are shown. (C) Class 1 OGs (C1OGs): orthologous groups comprising single rickettsial groups (e.g., AG, TG, TRG, and SFG), shared rickettsial groups (subgeneric), plasmid-harboring genomes, and genomes with common arthropod hosts. Two representative (left) and two non-representative (right) C1OGs are shown. (D) Class 2 OGs (C2OGs): orthologous groups with patchy distribution across the rickettsial tree, depicting gene losses and/or genes acquired laterally. Two representative and two non-representative C2OGs are shown.
Table 1.
Distribution of representative and non-representative OGs predicted across 14354 ORFs from ten rickettsial genomes, and their categorization into Class 1 and Class 2 OGs.1
Table 2.
Breakdown of membership (no. ORFs) across 2082 rickettsial OGs.
Table 3.
Distribution across 10 rickettsial genomes of OGs and singletons containing proteins with ankyrin (ANK) and tetratricopeptide repeat (TPR) motifs, proteins with rickettsial palindromic elements (RPE), proteins associated with transposable elements (TPN), proteins of toxin-antitoxin modules (TA), and phage related proteins.
Figure 5.
Comparison of the distributions of 1300 representative and 145 non-representative class 1 OGs (C1OGs), 66 false singletons, and 1467 singleton ORFs.
Slices depict 16 generic and subgeneric groups, false singletons, singletons, plasmid associated groups, and two host-related groups, with outer circle colors depicted in schema. Taxon abbreviations, including subgeneric groups, are explained in the Figure 1 legend. (A) Distribution of 1300 representative C1OGs and 1467 singletons. (B) Distribution of 79 non-representative C1OGs and 66 false singletons.
Figure 6.
Manual curation of 259 non-representative OGs predicted by OrthoMCL.
Schema depicts 179 OGs repaired to representative after stitching together split ORFs (larger pie chart) and remaining true non-representative OGs defined by in-paralogs.
Table 4.
Manual evaluation of 259 non-representative OGs across ten rickettsial genomes.
Table 5.
Characterization of 259 non-representative OGs per ten rickettsial genomes1.
Figure 7.
Distribution of representative and non-representative class 1 OGs (C1OGs) and singleton ORFs over estimated rickettsial phylogeny.
Boxes depict the distribution of phylogenetic groups, singletons, plasmid associated groups, and host-related groups: Red = AG rickettsiae, aquamarine = TG rickettsiae, blue = TRG rickettsiae, brown = SFG rickettsiae, gray = higher-level groupings, light green = R. bellii strains only. Orange boxes depict genes found on the pRF plasmid of R. felis str. URRWXCal2 and chromosomes R. felis and both R. bellii strains (as of this publication the R. bellii plasmids remain unavailable). Genes specific to single rickettsial genomes (singletons) are in yellow boxes, with taxon abbreviations explained in the Figure 1 legend. Host specific groups are defined by green (insect) and tan (tick) boxes. Genome statistics were compiled from the PATRIC and NCBI databases. Cladogram is based on trees shown in Figure 3. Inset in dashed box describes general schema for each box. *Total R. felis genome size: 1,485,148 bp = chromosome; 62,829 bp = pRF and 39,263 bp = pRFδ.
Figure 8.
Bioinformatic analysis of core representative OGs.
(A) Assignment of 731 core representative RiOGs to predicted cellular function categories. Format follows that established at the COG database (NCBI) except for cf = combined function and rpe = rickettsial palindromic element. (B) Comparison of the distribution of cellular function categories across 731 core rickettsial OGs (Ri), a recent protein expression profile for R. felis [40] (Rf), and COGs for three other bacteria: Escherichia coli (Ec), Yersinia pestis (Yp) and Chlamydia trachomatis (Ct). Inset at left shows the number of genes per genome for cellular function categories involved in organic and inorganic transport and metabolism (E, F, G, H, I, P, and Q) followed by the percentage these genes comprise of total protein-encoding genes. Results from a six-way regression analysis are shown in the right inset.
Table 6.
OGs missing in the lineage spanning R. canadensis and TG rickettsiae.
Figure 9.
Phylogeny estimation of the ten analyzed rickettsial taxa plus R. helvetica and R. australis based on 16 proteins.
See Table S13 for gene names and sequence accession numbers. Tree estimated under parsimony (see text).
Table 7.
Results of a BLASTP search for RiOG_2081 using RP338 (R. prowazekii) as a query1.
Table 8.
OGs present only in TRG rickettsiae.
Table 9.
OGs present only in R. bellii strains and TRG rickettsiae.
Table 10.
OGs present only in SFG rickettsiae.
Table 11.
Results of BLASTP searches evaluating two OGs (1496 and 1497) predicted by OrthoMCL to contain only insect-associated rickettsiae.
Table 12.
Results of BLASTP searches evaluating two OGs (RiOG_1005 and RiOG_1012) predicted by OrthoMCL to contain only tick-associated rickettsiae.
Table 13.
Distribution of the 68 R. felis pRF plasmid ORFs within the OGs predicted by OrthoMCLA.
Figure 10.
Analysis of the distribution of 1467 singleton ORFs omitted from OG prediction across 10 rickettsial genomes.
(A) Singleton ORFs across four rickettsial groups. (B) Singleton ORFs across 10 rickettsial genomes. First number is total number of singleton ORFs per taxon, with second number the total singleton ORFs annotated as HPs. Dashed lines in pie charts separate characterized proteins from HPs, with percentages given only for HPs. (C) Average lengths of singleton ORFs with predicted functions versus singleton ORFs annotated as HPs for all ten analyzed rickettsial genomes.