Fig 1.
Superfamily assignment of PAS domains in sequence and structure classification databases.
(A) Extracellular PAS-like domains; (B) intracellular PAS domains. Assignments of PDB structures by Pfam [26] (red), SCOP [27] (green) and CATH [28] (blue) are shown as Venn diagrams to scale.
Fig 2.
Comparison of sequence- and structure-based definitions for extracellular PAS-like domains.
(A) Vibrio parahaemolyticus chemoreceptor (PDB: 2QHK); (B) Vibrio cholerae chemoreceptor (PDB: 3C8C). Domains are visualized on sequences with corresponding amino acid positions (top) and structures (bottom). Cache (cyan) domains are defined by Pfam; PAS domains (green and magenta) were defined by visual inspection of corresponding structures.
Table 1.
Newly defined Cache superfamily.
Fig 3.
Length distribution of Cache domains identified using the new domain models.
Results for searches of the Pfam 27.0 associated UniProt database (June 2012 release) using the newly built single and double Cache models and the unchanged YkuI_C model are shown. Shaded areas show the upper and lower boundaries of known single and double Cache domain structures. Outliers represent partial protein sequences as well as partial matches to models (very short sequences) and sequences with large insertions within the Cache domain (very long sequences). See S2 Data for details.
Fig 4.
Relationship between Cache (red), PAS (blue) and GAF (green) superfamilies.
(A) HMM-to-HMM comparisons. The nodes represent domain families. Links represent reciprocal hits in hhsearch. Hits with an E-value <1e-3 are shown as thick lines, those with E-value <1e-1 are shown as thin lines and dotted lines represent hits with >90 probability score. Filled circles represent PAS and GAF domain families that were identified in HHpred search using new Cache models. Families that were not identified in these searches are depicted by empty circles (B) Sequence-to-sequence comparisons. The outer circle represents domain families. Links between individual sequences represent reciprocal BLAST hits with an E-value threshold of 1e-8, the lowest E-value at which no links between superfamilies were found. However, the overall relationships shown here remain at less stringent E-values.
Fig 5.
Phyletic distribution of PAS (blue), GAF (green) and Cache (red) domains.
Flags at the outer three layers represent the domain presence in a corresponding genome. The tree was built using taxonomic ranks retrieved from NCBI.
Fig 6.
Relative abundance of known extracellular sensory domains in prokaryotes.
Domain counts were obtained by running Pfamscan against a dataset of non-redundant prokaryotic extracellular sequences, which was also used for HMM construction (see Methods).
Fig 7.
Examples of newly identified and better defined Cache domains in diverse signal transduction proteins from bacteria, archaea and eukaryotes.
Domain architectures for representative sequences from model organisms are shown along with their UniProt accession numbers. Newly defined Cache domains are shown in red. Cache boundaries defined by the previous Pfam models are shown in pink (Cache) and green (MCP_N). HAMP domains are shown as grey circles, PAS domains as cyan circles, and HisKA domains as white circles. Other Pfam domains are abbreviated as follows: MCP, MCPsignal; GGDEF, GGDEF; GC, guanylate cyclase; HK, the histidine kinase HATPase_c domian; RR, response regulator; VWA, a combination of VWA_N and VWA domains; VGCC, VGCC_alpha2.