PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences

doi:10.1371/journal.pone.0253411

Fig 1.

Flow-chart describing basic procedure of PDBrenum.

More »

Expand

Fig 2.

Small fragments of the Pandas dataframes assembled from SIFTS files: (A) 2vl3, (B) 2aa3, and (C) 1d5t.

Entry 2vl3 contains a His tag that is not observed in the coordinates. Chain D of 2aa3 contains insertion codes (column 2) for some residues. Entry 1d5t contains a His tag with negative author residue numbers (column 2). The PDBe column in each image contains data for each amino acid in tuples (SeqResNum, ResName, and EntityId), where SeqResNum is the position of the amino acid in the sequence numbered from 1 to N (the length of the sequence). This field acts as the Pandas dataframe index for the whole table, since it is unique for each amino acid. The PDB column contains tuples (AuthResNum + InsCode, ResName, and ChainID). The UniProt column contain tuples of (UniProtResNum, UniProtResName, ChainID) and if there is no UniProt residue number, it contains the number “50,000”. The next column contains the UniProt AccessionID. The column UniProt_50k provides the final numbering of residues in the PDBrenum output file: it is the UniProt number when it is available, and 50,000+SeqResNum when there is no UniProt for a chain that has a UniProt. Chains with no UniProt in SIFTS are not renumbered.

More »

Expand

Fig 3.

Renumbering of the pdbx_poly_seq_scheme table from 2aa3 processed by PDBrenum.

Left: the original file from the PDB. Right: the renumbered file from PDBrenum. The original author numbering is given in column 6 of the table on the left (18, 19, 20, etc.), which is replaced with the UniProt numbering (entry Q9PRK9) for this chain (2,3,4, etc) in column 6 of the table on the right. The original author numbering has been placed in column 7 of the table on the right (i.e., in the auth_seq_num position).

More »

Expand

Fig 4.

Log file of PDBrenum on PDB entry 2aa3.

In this logfile, “SP” means “special case” and denotes which chains were handled in specific ways. It is “+” for cases where there is no clash in UniProt numbers. It is “*” for the case where a chain has more than one UniProt in SIFTS; the UniProt that represents the largest portion of the chain is taken unless it is in the exception list of UniProts often used as crystallization chaperones [GFP_AEQVI, GCN4_YEAST, C562_ECOLX, ENLYS_BPT4, MALE_ECOLI]. PDB_id is the 4-character long PDB identifier; chain_PDB is the chain identifier given by the PDB in mmCIF files (label_asym_id); chain_auth is a chain identifier given by authors of the structure (auth_asym_id); UniProt is the 6-character long UniProt identifier (e.g. Q4PRK9); SwissProt is the human-readable UniProt identifier (e.g., P53_HUMAN); uni_len is the number of residues in the chain represented in the UniProt sequence; chain_len represent the length of the chain sequence; renum represents total quantity of residues that were renumbered according to UniProt; 5k_or_50k represents quantity of residues that were renumbered by adding 5000 to 1-to-N numbering for the PDB-legacy format or adding 50000 to the 1-to-N numbering for the mmCIF format.

More »

Expand

Fig 5.

Renumbering PDB entry 2aa3.

Screenshot of files 2aa3 before (left) and after (right) PDBrenum: “_atom_site” (coordinate section) (A) and “struct_conf” (B). Green arrows pointed to the columns which were changed.

More »

Expand

Fig 6.

Screenshot of the PDBrenum server.

The server takes in a list of PDB entry codes (comma, space, tab, or newline separated) or a list of UniProt (e.g. P38398) or SwissProt (e.g., BRCA1_HUMAN) accession codes. The user can choose whether to obtain mmCIF and/or PDB-format files, and whether to obtain asymmetric units and/or biological assemblies with check boxes. If more than one file is requested, a zip file is returned, and the name of this file can be specified.

More »

Expand

Fig 7.

Biological assemblies of human bone morphogenetic protein 2 (BMP2_HUMAN downloaded with PDBrenum.

BMP2 is a homodimer (orange and blue) that binds Type I (magenta) and Type II receptors (cyan), RGM domain family members (yellow), and von Willebrand factor C-terminal domains (green).

More »

Expand