Skip to main content
Advertisement

< Back to Article

Figure 1.

Workflow of the ECOD automatic domain classification pipeline.

Unclassified structures enter from the top (white). Firstly, peptides, coiled-coils, and other unclassifiable regions are removed where possible and placed in their respective special architectures (orange). Secondly, unassigned regions of the input sequence are iteratively assigned by descending best hits from BLAST and HHsearch-based searches of ECOD databases. Assemblies of putative domains are optimized and assigned (green). If the chain is incomplete by sequence, a similar process occurs using DaliLite searches. If the query remains unclassified, it is manually curated (yellow).

More »

Figure 1 Expand

Figure 2.

Hierarchical levels of ECOD.

Domains placed within the same Architecture share similar secondary structure content (helix, cyan; sheet, yellow) and geometric arrangement. Domains placed within the same X-group share similar structure but lack a convincing argument for homology (vs. analogy), while those placed within the same H-groups are homologous. X- and H- group structures are colored in rainbow by consecutive secondary structure elements. T-groups distinguish homologous domains with notable differences in topology, such as the illustrated Rift-related metafold [18]. Rift-related half-barrels (colored blue and red) are consistent among the domains, but permutations and strand swaps (green) modify the topology.

More »

Figure 2 Expand

Figure 3.

Number of ECOD H-groups containing 1 or more SCOP superfamily (blue) or CATH homologous superfamily(red).

The majority contain only a single SCOP superfamily(88%) or CATH homologous superfamily (81%). The most merged (not shown) ECOD H-group is the Immunoglobulin-related domains, which contains 47 SCOP superfamilies and 81 CATH homologous superfamiles.

More »

Figure 3 Expand

Table 1.

Summary statistics of ECOD v22b (July 31 2013).

More »

Table 1 Expand

Figure 4.

Classification of ECOD and ECOD hierarchical levels with respect to the PDB and other classifications.

A) A cumulative sum of PDB release dates from Jan-2000 to Jan-2014 (red) compared to classified PDB depositions in ECOD (green), SCOP (cyan), and CATH (blue). Any deposition with at least one domain classified is counted. ECOD consistently classifies more structures than SCOP and CATH and is more up-to-date. b) The cumulative sum of PDB deposition dates in ECOD hierarchical levels. Each group is classified once by its oldest deposition. The number of new levels increases consistently over time over the 2000 to 2014 time period.

More »

Figure 4 Expand

Figure 5.

Classification methods used for non-redundant (NR) chains for weekly ECOD updates.

“Automatic” chains could be completely and confidently classified by domain pipeline and required no manual intervention. “Manual” chains were at best partly classified by software and required manual curation (i.e. some domain boundaries could not be properly detected or some domains could not be reliably classified using sequence methods). Non-domain” chains contained peptides, coiled-coils, or other cases requiring manual curation.

More »

Figure 5 Expand

Figure 6.

Distribution of H-groups in ECOD by architecture (a) and 95% representative domain population (b).

A) H-groups are colored by architecture and sized according to their representative domain population. H-groups smaller than 0.01 radians are not displayed. Those H-groups shown in bottom distributions are labeled. B) The most populated H-groups (>500 95% representative domains) are colored by architecture. The immunoglobulin-related, Rossmann-related, and helix-turn-helix (HTH) H-groups are the most populated H-groups in ECOD. The inset shows the most populated H-groups by number of F-groups.

More »

Figure 6 Expand

Figure 7.

Structure similarity distribution of domain pairs from SCOP superfamily, SCOP fold and ECOD H-group, measured by TM-score.

Data were grouped into three panels by sequence similarity in terms of HHsearch probability (Low: probability ≤20%, Medium: 20%<probability<90%, High: probability ≥90%) and then binned into 20 bins to calculate frequency.

More »

Figure 7 Expand

Figure 8.

A) Distribution of domains per chain for ECOD(red), SCOP(green), and CATH(blue).

Both ECOD and SCOP allow for multi-chain domains (MC), but these are a small fraction of the classification. ECOD contains slightly more single-chain domains than CATH, but less than SCOP. B) ECOD slightly favors smaller domains over SCOP and longer domains over CATH.

More »

Figure 8 Expand

Figure 9.

Venn diagram of the shared homologous domain pairs among those ECOD (cyan), SCOP (green), and CATH (red) nonredundant domains with similar (80%) domain ranges.

A plurality of domain pairs are shared among all three classifications. A large fraction of domain pairs can solely be observed in ECOD. 11.4% of domain pairs are only shared between ECOD and CATH.

More »

Figure 9 Expand

Figure 10.

Growth of ECOD groups with no mapping to SCOP or CATH over time.

Growth of all groups increases as proportion of PDBs classified by SCOP or CATH decreases. Unmapped H-groups represent a significant fraction of total ECOD H-groups. Unmapped X-groups are potentially interesting cases of novelty.

More »

Figure 10 Expand

Figure 11.

Representative domains in ECOD classified by type (manual or provisional) and cluster type (Pfam, HH, or unclustered).

Manual representatives have been inspected by a curator and assignment to the hierarchy has been verified. Provisional representatives contain no close homologous link to a manual representative and cluster separately into a Pfam- or HH-based cluster. Unclustered representatives are either awaiting clustering or cannot be clustered due to some technical problem. The majority of representatives in ECOD are manual Pfam representatives (79%), followed by manual HH-clustered representatives (10%) and provisional Pfam representatives (8%).

More »

Figure 11 Expand

Figure 12.

SAM MTases and Rossmann domains.

(A) SAM MTases as represented by ribosomal protein L11 methyltransferase complexed with SAM (PDB 2nxe). (B) Rossmann domains as represented by formaldehyde dehydrogenase complexed with NAD (PDB 1kol). In (A) and (B), helices are colored in cyan, strands in yellow, and loops in white. The additional strand 7 in SAM-MTase is colored in orange. The respective cofactor, SAM or NAD, is shown in sticks. The Gly-rich loop beneath the cofactor is colored in magenta. The conserved Asp or Glu that forms hydrogen bonds with the adenosine ribose hydroxyls is shown in sticks. Diagrams are made by Pymol (The PyMOL Molecular Graphics System, Schrödinger, LLC. http://www.pymol.org/). (C) Manually modified DALI [22] alignment between the two domains shown in (A) and (B). Starting and ending residue numbers are labeled before and after the alignment. β-strands and α-helices are labeled numerically and shown in arrows and cylinders respectively above the sequence alignment. The Gly-rich loop is highlighted in magenta, and the conserved Asp or Glu is highlighted in red.

More »

Figure 12 Expand

Figure 13.

Structures of homologous members of the FZ-CRD (A,B), glypican (C), folate receptor (D), and NPC1 (E).

Conserved disulfide bonds are shown in pink sticks with labels by their sides. Four core helices are labeled H1–H4. N- and C-termini are shown. Homology detected by distinct cysteine residue patterns was used as the basis for merging these families into a homologous group (H-group) in ECOD. F. Pairwise Dali Z- scores between pairs of the structures. G. Multiple sequence alignment of the structures shown, with conserved cysteines highlighted on black background. Cysteines forming a disulfide bond are labeled by the same sign for FZ-CRDs from Frizzled8 and MuSK (line above the sequences) and glypican, folate receptor and NPC1 (line below the sequences). Four core helices (H1–H4) are shown below the alignment in cylinder representation.

More »

Figure 13 Expand

Figure 14.

ECOD recognizes novel evolutionary relationships.

A) Duf371 (3cbn) forms an 8-stranded β-barrel from intertwined β-strands of a tandem structural duplication. The N-terminal half (blue shades) includes an overside connection between adjacent β-strands (blue) that follows a conserved His (black spheres). The symmetrically related C-terminal half (red shades) includes a similar overside connection (red) following a less conserved His (gray spheres). B) The Duf371 C-terminal repeat (salmon) is rotated about the Z-axis to superimpose (RMSD 1.3) with the N-terminal repeat (slate). C) The GutA-like PTS system IIA component (2f9h) forms a similar duplicated β-barrel. An invariant His in the C-terminal half likely represent the PTS IIA phosphorylation site. D) The PK β-barrel domain-like fold (1pkla1) displays a similar intertwined topology, but retains only a single overside connection (blue) in the N-terminal half. E) PSI-BLAST alignment of the Duf371 repeats detected with Mefer0473 sequence supports the duplication event, with sequence similarities indicated between N-terminal and C-terminal halves. A structure-based alignment of the 2F9H C-terminus is included below. Structural elements (arrow for strand and cylinder for helix) and conservations (calculated by Al2Co [59]) are indicated above/below the corresponding sequences. Conserved positions are highlighted yellow (mainly hydrophobic) and black (polar). Surface representations of F) PTSIIA in the same orientation as in panel C and G) Duf371 in the rotated orientation of panel B are colored in rainbow according to conservation, from blue (less) to red (more).

More »

Figure 14 Expand