Fig 1.
Flow-chart describing basic procedure of PDBrenum.
Fig 2.
Small fragments of the Pandas dataframes assembled from SIFTS files: (A) 2vl3, (B) 2aa3, and (C) 1d5t.
Entry 2vl3 contains a His tag that is not observed in the coordinates. Chain D of 2aa3 contains insertion codes (column 2) for some residues. Entry 1d5t contains a His tag with negative author residue numbers (column 2). The PDBe column in each image contains data for each amino acid in tuples (SeqResNum, ResName, and EntityId), where SeqResNum is the position of the amino acid in the sequence numbered from 1 to N (the length of the sequence). This field acts as the Pandas dataframe index for the whole table, since it is unique for each amino acid. The PDB column contains tuples (AuthResNum + InsCode, ResName, and ChainID). The UniProt column contain tuples of (UniProtResNum, UniProtResName, ChainID) and if there is no UniProt residue number, it contains the number “50,000”. The next column contains the UniProt AccessionID. The column UniProt_50k provides the final numbering of residues in the PDBrenum output file: it is the UniProt number when it is available, and 50,000+SeqResNum when there is no UniProt for a chain that has a UniProt. Chains with no UniProt in SIFTS are not renumbered.
Fig 3.
Renumbering of the pdbx_poly_seq_scheme table from 2aa3 processed by PDBrenum.
Left: the original file from the PDB. Right: the renumbered file from PDBrenum. The original author numbering is given in column 6 of the table on the left (18, 19, 20, etc.), which is replaced with the UniProt numbering (entry Q9PRK9) for this chain (2,3,4, etc) in column 6 of the table on the right. The original author numbering has been placed in column 7 of the table on the right (i.e., in the auth_seq_num position).
Fig 4.
Log file of PDBrenum on PDB entry 2aa3.
In this logfile, “SP” means “special case” and denotes which chains were handled in specific ways. It is “+” for cases where there is no clash in UniProt numbers. It is “*” for the case where a chain has more than one UniProt in SIFTS; the UniProt that represents the largest portion of the chain is taken unless it is in the exception list of UniProts often used as crystallization chaperones [GFP_AEQVI, GCN4_YEAST, C562_ECOLX, ENLYS_BPT4, MALE_ECOLI]. PDB_id is the 4-character long PDB identifier; chain_PDB is the chain identifier given by the PDB in mmCIF files (label_asym_id); chain_auth is a chain identifier given by authors of the structure (auth_asym_id); UniProt is the 6-character long UniProt identifier (e.g. Q4PRK9); SwissProt is the human-readable UniProt identifier (e.g., P53_HUMAN); uni_len is the number of residues in the chain represented in the UniProt sequence; chain_len represent the length of the chain sequence; renum represents total quantity of residues that were renumbered according to UniProt; 5k_or_50k represents quantity of residues that were renumbered by adding 5000 to 1-to-N numbering for the PDB-legacy format or adding 50000 to the 1-to-N numbering for the mmCIF format.
Fig 5.
Screenshot of files 2aa3 before (left) and after (right) PDBrenum: “_atom_site” (coordinate section) (A) and “struct_conf” (B). Green arrows pointed to the columns which were changed.
Fig 6.
Screenshot of the PDBrenum server.
The server takes in a list of PDB entry codes (comma, space, tab, or newline separated) or a list of UniProt (e.g. P38398) or SwissProt (e.g., BRCA1_HUMAN) accession codes. The user can choose whether to obtain mmCIF and/or PDB-format files, and whether to obtain asymmetric units and/or biological assemblies with check boxes. If more than one file is requested, a zip file is returned, and the name of this file can be specified.
Fig 7.
Biological assemblies of human bone morphogenetic protein 2 (BMP2_HUMAN downloaded with PDBrenum.
BMP2 is a homodimer (orange and blue) that binds Type I (magenta) and Type II receptors (cyan), RGM domain family members (yellow), and von Willebrand factor C-terminal domains (green).