Fig 1.
The protocol consists of six steps to generate the BrEPS database. A: Selection and preparation of sequences. B: All-vs-all BLAST of sequences. C: Complete linkage clustering based on the E-value from BLAST. D: Multiple sequence alignment and pattern creation on selected nodes. E: Pattern verification. F: Preparation of the final database.
Table 1.
Filters applied to UniProt protein entries to parse enzyme data from UniProt flatfiles.
Fig 2.
Detailed overview of the new data selection.
Only Swiss-Prot sequences with evidence on protein level (A) are used as seed sequences to retrieve additional, non-redundant sequences from TrEMBL and Swiss-Prot using UniRef references with >= 50% sequence identity (B and C). These additional sequences get the corresponding Swiss-Prot annotation and are merged with the seed sequences into one database (D).
Table 2.
Similarity sets created for every amino acid pair to complement semi-conserved amino acid positions with other highly similar amino acids.
Fig 3.
Semi-conserved pattern positions are extended with amino acids from PAM250-based similarity sets.
Table 3.
Proposed EC number examples for patterns created from sequences with different EC numbers.
Table 4.
Evaluation results for BrEPS 1.0 and BrEPS 2.0 using UniProt release 2014_10.
Table 5.
Evaluation results for BrEPS 2.0 using UniProt release 2017_01.
Table 6.
Comparison of InterPro 63.0 with BrEPS 2017.1.