A novel Porphyromonas gingivalis enzyme: An atypical dipeptidyl peptidase III with an ARM repeat domain

Porphyromonas gingivalis, an asaccharolytic Gram-negative oral anaerobe, is a major pathogen associated with adult periodontitis, a chronic infective disease that a significant percentage of the human population suffers from. It preferentially utilizes dipeptides as its carbon source, suggesting the importance of dipeptidyl peptidase (DPP) types of enzyme for its growth. Until now DPP IV, DPP5, 7 and 11 have been extensively investigated. Here, we report the characterization of DPP III using molecular biology, biochemical, biophysical and computational chemistry methods. In addition to the expected evolutionarily conserved regions of all DPP III family members, PgDPP III possesses a C-terminal extension containing an Armadillo (ARM) type fold similar to the AlkD family of bacterial DNA glycosylases, implicating it in alkylation repair functions. However, complementation assays in a DNA repair-deficient Escherichia coli strain indicated the absence of alkylation repair function for PgDPP III. Biochemical analyses of recombinant PgDPP III revealed activity similar to that of DPP III from Bacteroides thetaiotaomicron, and in the range between activities of human and yeast counterparts. However, the catalytic efficiency of the separately expressed DPP III domain is ~1000-fold weaker. The structure and dynamics of the ligand-free enzyme and its complex with two different diarginyl arylamide substrates was investigated using small angle X-ray scattering, homology modeling, MD simulations and hydrogen/deuterium exchange (HDX). The correlation between the experimental HDX and MD data improved with simulation time, suggesting that the DPP III domain adopts a semi-closed or closed form in solution, similar to that reported for human DPP III. The obtained results reveal an atypical DPP III with increased structural complexity: its superhelical C-terminal domain contributes to peptidase activity and influences DPP III interdomain dynamics. Overall, this research reveals multifunctionality of PgDPP III and opens direction for future research of DPP III family proteins.

Introduction studies revealed similarities but also differences between the properties of BtDPP III and its eukaryotic DPP III homologs [24][25][26].
According to a bioinformatic analysis of the predicted amino acid sequence, peptidase M49 from P. gingivalis W83 contains the hexapeptide HEXXXH active site motif, and four additional evolutionarily conserved regions of the DPP III family [27], all of which are essential for the catalytic function of DPP III. Therefore, it is reasonable to assume that P. gingivalis M49 peptidase (DPP III) would cleave N-terminal dipeptides from its substrates. The findings of Takahashi and Sato [28,29] clearly indicate that P. gingivalis uses dipeptides preferentially as its sole source of carbon, suggesting the importance of dipeptidyl peptidase type of enzymes for the growth of this asaccharolytic bacterium. This indicates that peptidase M49 might be an important hydrolase for the survival of the bacterium P. gingivalis W83. This motivated us to characterize this protein using a combination of experimental and computational approaches.

Cloning
The gene encoding full-length PgDPP III consisting of 886 aa (PG_0317, 2661 bp) was obtained by PCR using genomic DNA of P. gingivalis strain W83, kindly provided by Dr. Margaret Duncan (The Forsyth Institute, Cambridge, MA, USA). The amplified gene was cloned into a pET21a plasmid between NheI and XhoI restriction sites. Domain fragments of PgDPP III (1-679 aa) and AlkD alkylation domain like fragments (648-886, 660-886 and 675-886 aa) were cloned into a pLATE31 plasmid (aLICator kit, Thermo Fisher Scientific, Waltham, MA, USA). Replacement of Glu433 with Ala in the inactive variant (E433A) was performed according to instructions from the Q5 1 Site-Directed Mutagenesis Kit (NEB). All constructs were cloned with a single C-terminal His tag. The primers, listed in S1 Table, were custom synthesized by Sigma Aldrich (St. Louis, MO, USA). Sequences of the cloned products were obtained with the automatic sequence analyzer ABI PRISM 1 3100-Avant Genetic Analyser.

Overexpression, purification, and characterization of recombinant DPP III proteins
Appropriate clones were transformed into the Escherichia coli expression strain BL21-Codon-Plus(DE3)-RIL. The cells were grown in Luria Bertani broth with 100 μg/mL ampicillin. When the cells reached an OD 600 of 0.6, the flasks were transferred to 16˚C and induction of protein expression was carried out overnight by adding 0.1 mM IPTG (isopropyl β-D-1-thiogalactopyranoside). After harvesting, the cell pellets were resuspended in lysis buffer (20 mM NaPO 4 , 500 mM NaCl, 20 mM imidazole, 10 mM 2-mercaptoethanol, 1% glycerol, pH 8.0) and lysed by adding lysozyme followed by sonication on ice. A first purification step was done using Ni-NTA affinity chromatography according to Jajčanin-Jozić et al. [30], eluting the protein with 500 mM imidazole. Subsequent purification steps involved gel filtration on a HiLoad 16/60 Superdex 200 Prep grade column, followed by anion exchange on a HiPrep Q HP 16/10 column. Protein purity was confirmed by SDS-PAGE.
Secondary structure and temperature stability of recombinant proteins were determined by recording circular dichroism spectra (CD) on a Jasco J-815 spectropolarimetar (JASCO, Easton, MD, USA) with automatic temperature control according to Jajčanin-Jozić et al. [30]. Analysis of protein secondary structure from CD spectra was performed by using the CDSSTR program at the DICHROWEB site (http://public-1.cryst.bbk.ac.uk/cdweb/html). Protein concentrations were determined by the Bradford method [31].
Enzyme activity determination and kinetic analysis DPP III activity was determined by a standard colorimetric assay described previously [32] using Arg-Arg-2-naphthylamide (Arg 2 -2NA) and Arg-Arg-7-amido-4-methylcoumarin (Arg 2 -AMC) as substrates, at 37˚C and pH 8.0, in 50 mM Tris-HCl buffer (I = 0.01) containing 100 μM CoCl 2 . For the determination of kinetic parameters (K m and k cat ) for the hydrolysis of the diarginyl substrates, initial hydrolysis rates were measured fluorimetrically at 25˚C and at pH 8.0 in the presence of 100 μM CoCl 2 [32]. The kinetic parameters were determined from the initial reaction rates, using a nonlinear regression program (GraphPad Prism version 5.04; GraphPad, La Jolla, CA, USA).

Isothermal titration calorimetry (ITC)
Microcalorimetric data for ligands binding to the inactive E433A variant were measured in 50 mM Tris-HCl pH 8.0, 100 mM NaCl with a VP-ITC microcalorimeter (MicroCal, Northampton, MA, USA) equilibrated at 25˚C. For this purpose, all ligands were dissolved in the same buffer as that of the purified enzyme. Both protein and ligand solutions were degassed immediately before the measurement. A titration experiment involved 30 injections using a 400 μM solution of peptide in the syringe against a 40 μM solution of PgDPP III E433A in the measuring cell. In a standard experiment, a total of one aliquot of 2 μL and 29 aliquots of 10 μL of the peptide solution were injected into 2 mL of the protein solution under constant stirring at 270 rpm. Every injection was carried out over a period of 20 s with a spacing of 225 s between the injections. The corresponding heat of binding was calculated by integrating the area under each observed peak in the thermogram. A reference measurement where the peptide was injected into the buffer was performed as mentioned above and was subtracted to correct for the heat of dilution of the peptide. Nonlinear least-squares fitting using Origin version 7.0 (MicroCal) was used to obtain association constants (K a ), heats of binding (ΔH) and stoichiometries. All measurements were made in duplicate.
Small angle X-ray scattering (SAXS) Data collection for the SAXS studies was performed using a PILATUS 1M Image Plate detector at the BM29 BIOSAXS beamline at the European Synchrotron Radiation facility (ESRF), Grenoble, France. The distance between the sample and the detector was 2.5 m. The protein samples for full-length PgDPP III were measured at three different concentrations: 8.8, 4.43, and 1.02 mg/mL, in 50 mM Tris-HCl, pH 8.0, 100 mM NaCl buffer. Bovine serum albumin (BSA) at a concentration of 4.5 mg/mL was used as standard solution. The program PRIMUS was used to perform data analysis; scattering from the buffer was subtracted as background from the protein measurements [33]. Data from multiple concentrations were merged for data analysis and the evaluation of the radius of gyration (Rg) and the forward scattering intensity (I(0)) was performed using the Guinier approximation [34]. The pair distribution function was calculated with GNOM [35]. The theoretical scattering curve based on the atomic structure of the protein was calculated using CRYSOL and the subsequent calculation of the pair distribution function was performed using GNOM [35][36][37].

HPLC-MS analysis
HPLC-MS analyses were carried out on an Agilent 1260 Infinity HPLC system (consisting of a G1311B quaternary pump, a G1329B autosampler, a G1316A thermostated column compartment and a G1314F variable wavelength detector) equipped with an Agilent ZORBAX SB-C8 column (15 mm × 4.6 mm, particle size: 3.5 μm) and coupled to an Agilent 6120 quadrupole LC-MS detector. The column effluent was split using a standard T-piece to allow the simultaneous recording of UV/Vis and MS signals. Method 'GRADIENT_A-B_PEPTIDE_POS': Eluents: water cont. 0.1% formic acid (A), acetonitrile cont. 0.1% formic acid (B); gradient elution program: 10% to 60% B over 25 min, 60% B for 1 min, 60% to 10% B over 4 min, 10% B for 5 min; injection volume: 5 μL; UV/Vis detection wavelength: 215 nm; MS ionization mode: ESI, pos.; MS spray chamber settings: drying gas temperature: 300˚C, drying gas flow 10 mL/min, nebulizer pressure: 35 psig, capillary voltage 3000 V; MS signal 1: scan m/z = 100-1100, step size: 0.1, fragmentor voltage: 150 V, cycle time: 80%; MS signal 2: SIM on sample target masses, fragmentor voltage 150 V, cycle time: 20%. The HPLC-MS analysis was performed to determine if the peptides act as substrates or inhibitors. For this purpose, a mixture of 200 μL containing 1 mM solution of the corresponding peptide and 0.15 mM solution of PgDPP III was incubated for 24 hours at room temperature and then analyzed using the method described above. Additionally, time series measurements were performed to obtain the velocity of angiotensin II degradation. For this purpose, a mixture of angiotensin II (1 mM) and PgDPP III (0.15 mM) was incubated and the reaction was stopped using HPLC-graded acetonitrile at 5 0 , 30 0 , 1h, 2h, 3h, 4h, 5h, 6h and 24h. Corresponding time series were recorded for the degradation of angiotensin II by human DPP III which was expressed and purified as described previously [38].

MMS complementation assay
The B. cereus plasmid pUC18alkD and the DNA glycosylase-deficient strain E. coli BK2118 (tag, alkA) were a kind gift from Magnar Bjørås (Oslo University Hospital/University of Oslo). Full-length P. gingivalis DPP III, the C-terminal AlkD like domain (648-886 aa), and P. gingivalis AlkD (Uniprot code Q7MV52) were subcloned into the pUC18 plasmid between EcoRI and PstI restriction sites. Primers are listed in S1 Table. For plasmid propagation, E. coli TOP10 cells (Invitrogen, USA) were used. The complementation test was performed as described previously [39]. Briefly, E. coli BK2118 was transformed using Roti 1 -Transform (Roth) according to instructions, using the pUC18 constructs. We used an empty pET21a plasmid as a negative control, and the original B. cereus pUC18alkD as a positive control. BK2118 clones complementing the alkylation-sensitive phenotype were selected on Luria-Bertani (LB) agar plates containing 1, 2 or 5 mM of the alkylating agent methyl methanesulphonate (MMS) and ampicillin at a concentration of 50 μg/mL. From selected clones, overnight cultures were prepared and the next day serial dilutions of bacteria were plated on LBA/MMS plates. These plates were incubated for 2 days at 37˚C.

Computational methods
Bioinformatics and homology modelling. As the 3D structure of the DPP III from P. gingivalis has not yet been determined experimentally, we resorted to comparative modelling. The sequence was retrieved from the UniProt database (http://www.uniprot.org) and the domain structure and organization of PgDPP III was predicted using two different approaches, the web server Phyre2, http://www.sbg.bio.ic.ac.uk/~phyre2 [40] and the stand-alone program Modeller9 [41,42]. The model was built using two templates. The model of the 3D structure of the DPP III domain (amino acids 1-659) was determined using the experimentally determined structure of BtDPP III (PDB_code: 5NA7) as a template using the Phyre2 server. The sequence similarity between BtDPP III and the DPP III domain of PgDPP III is 51%. In order to identify a suitable template for the ARM domain a PSI-BLAST search was done using the C-terminal region (amino acids 660-886). A multiple sequence alignment was done using Clustal O 1.2.1 [43] at http://www.ebi.ac.uk/Tools/msa/clustalo/. Ultimately, the model of the ARM domain (660-886 aa residues) was determined using the program Modeller with the experimentally determined structure of AlkF from B. cereus (PDB_code 3ZBO, sequence similarity 15%) as a template. Molecular dynamics, system parametrization and preparation. The zinc ion was added into the structure of PgDPP III determined by homology modeling according to its position in BtDPP III. Since the mode of Zn 2+ binding is highly preserved in all DPP III orthologues we consider such a procedure justified. The obtained structure was used as the initial structure for MD simulations. The protein parametrization was performed within the ff14SB [44] force field using leap, a basic preparation program for Amber simulations available within the AMBER16 package (http://ambermd.org) [45]. For the zinc ion, parameters derived in previous work were used [46]. All Arg and Lys residues in the structure were positively charged (+1e) while Glu and Asp residues were negatively charged (-1e), as expected at physiological (experimental) conditions. The protonation of histidines was checked according to their ability to form hydrogen bonds with neighbouring amino acid residues or to coordinate the metal ion. The substrates were parameterized within the generalized amber force field (gaff) [47] and the missing parameters were derived using the Antechamber module [48] from the Amber16 suite of programs.
The proteins and protein-substrate complexes, were placed in a truncated octahedron box filled with TIP3P water molecules [49], and Na + ions [50] were added in order to neutralize the systems.
MD simulations. Before running productive molecular dynamics simulations, the protein geometry was optimized in three cycles (each 1500 steps) and the system was equilibrated. In the first cycle of optimization, water molecules were relaxed, while the rest of the system was harmonically restrained with a force constant of 32 kcal mol -1 Å -1 . In the second and third cycle, the same force constant (32 kcal mol -1 Å -1 ) was applied to the zinc cation, while the protein backbone was restrained with force constants of 12 and 2 kcal mol -1 Å -1 , respectively. The energy minimization procedure, consisting of 470 steps of steepest descent followed by conjugate gradient optimization for the remaining steps, was the same in all cycles. During the first period of equilibration (200 ps of gentle heating from 0 to 300K with a time step of 1 fs), the NVT ensemble was used, while all of the following simulations were performed at constant temperature and pressure (300K and 1 atm, the NpT ensemble). During equilibration, the zinc ion and/or its ligands were weakly restrained. The temperature was held constant using a Langevin thermostat [51] with a collision frequency of 1 ps −1 . The pressure was regulated by a Berendsen barostat [52]. Bonds involving hydrogen atoms were constrained using the SHAKE algorithm [53]. The ligand free protein was equilibrated for 50 ns (with time steps of 1 fs and 2 fs, for the first 1.5 ns and the remaining 48.5 ns, respectively), while the sampling was performed during the 150 ns, productive MD simulation. Furthermore, the structure of the ligand free protein, obtained by ligand extraction from the PgDPP III-Arg 2 -2NA complex after a 100 ns simulation, was used for two additional, 150 ns MD simulations of the ligand free protein.
Docking. The structure obtained after 200 ns (50 ns of equilibration + 150 ns of productive MD) of MD simulation of the initial homology model of PgDPP III was used to build the PgDPP III substrate complexes. The PgDPP III-Arg 2 -2NA and PgDPP III-Arg 2 -AMC complexes were constructed using utilities of the program Pymol (The PyMOL Molecular Graphics System, Version 1.7 Schrödinger, LLC). The crystal structure of the tynorphin complex of the E451A variant of human DPP III (PDB code: 3T6B) was used as a template. Arg 2 -2NA was aligned to the bound tynorphin and then manually adjusted to avoid clashes with the enzyme. Obtained complex was energy minimized and equilibrated using the same procedure as described above for the ligand free enzyme. Since His437 moved away from the zinc ion during the equilibration, we used steered MD simulations (with a pulling force of 50 kcal mol -1 A -1 ) to bring it back. The thus obtained structure was again equilibrated for 30 ns (time step 1 fs) and two replicas of the PgDPP III-Arg 2 -2NA complex were simulated at constant temperature and pressure (300K and 1 atm, the NpT ensemble, time step 2 fs) one for 200 ns and the other for 150 ns. The PgDPP III-Arg 2 -AMC complex was built from the equilibrated structure of the PgDPP III-Arg 2 -2NA complex. After energy minimization and a short equilibration (40 ps) its trajectory was simulated for 150 ns.
Data analysis. In order to analyze and characterize the conformational space that PgDPP III structures span, as well as to determine the most relevant motions associated with protein closure, several types of data analysis were performed. All calculations were performed with the CPPTRAJ module of the Amber14 program package [45].

Hydrogen/deuterium exchange (HDX) analysis
Hydrogen/deuterium exchange experiments (HDX) were performed as described previously [54]. A stock solution of 45 μM PgDPP III in 20 mM Tris HCl, pH = 7.4 was prepared as well as the exchange buffer of the same composition and pH in D 2 O. H/D exchange reactions were carried out at room temperature and were started each time by diluting 5 μL of the stock solution into 45 μL of the exchange buffer. Reactions were performed in triplicate for incubation periods of 10 sec, 1 min, 20 min, 1 h and 4 h, each followed by acid quenching (adding 10 μL of 2 M glycine, pH = 2.5) and on-line pepsin proteolysis for 1.5 min. Deuterium uptake was measured for 96 non-overlapping peptic peptides covering 89% of the PgDPP III amino acid sequence. The deuterium content (D) of those peptides was calculated by taking into account gains and losses of deuterons during digestion and the HPLC-MS measurement. An adjustment was made as proposed by Z. Zhang et al. [55] using control experimental data for nondeuterated and fully deuterated samples.
Correlation between HDX and MD simulation data. Comparisons of the experimental values for the deuterium content of peptides at each incubation time with corresponding values predicted by MD simulations were carried out similar to that described previously [56]. Briefly, 50 ns long fragments of the MD-trajectories were sampled at every 1 ps resulting in 50000 snapshots of the PgDPP III structure. Simulations starting directly from the homology model and the simulations of the PgDPP III extracted from the 150 ns simulated PgDPP III-Arg 2 -2NA complex were used for the comparison.
Open state of the amide hydrogen for hydrogen/deuterium exchange reaction is defined as the number of snapshots where either NH or CO comes into contact with a water molecule. Closed state is defined as the number of all other snapshots. An 'in house' program was written in C# to detect backbone amide hydrogen bonding statistics by analysing frames of the MD trajectory. For each amide site the closed/open state ratio is calculated as follows: Closed/Open = (2 Ã N(total)-N(NH-wat)-N(CO-wat))/(N(NH-wat)+N(CO-wat)). N denotes number of the snapshots with the characteristics specified in the brackets, for example N(NH-wat) is the number of frames in which the amide nitrogen is hydrogen bonded to at least one water molecule. The calculated ratio is directly used as the amide site protection factor (PF) without any specific mapping function. Intrinsic chemical rates (k int ) for the hydrogen/deuterium exchange reaction of backbone amides are determined according to the procedure of Bai et al. [57]. In the case of PgDPP III, intrinsic rate constants were obtained as for poly-DL-alanine in D 2 O at 20˚C and pD corr = 7.4 and non-blocked terminal amino acids. The calculation was done using the program Sphere (http://landing.foxchase.org/research/labs/roder/sphere) with default values for pK a and activation energy. Rate constants for each amide hydrogen site were obtained as: k pred = k int /PF. The deuterium content for each PgDPP III peptide was calculated as follows: Summing amide hydrogen sites contributions within a peptide starts from the first residue next to the N-terminus and ends with the C-terminal residue.

Results
Porphyromonas gingivalis peptidase M49 is an atypical DPP III with a Cterminal extension exhibiting an ARM-type fold All characterized DPPs III (peptidases of the M49 family) are composed of 675-786 amino acids [20,22]. However, peptidase M49 from P. gingivalis W83 (PgDPP III) is an 886 amino acid residues long protein, 211 amino acids longer at the C-terminus than DPP III from B. thetaiotaomicron, and 149 amino acids longer than human DPP III.
We submitted sequences of the full-length protein (PgDPP III), the DPP III fragment (amino acids 1-659) and the C-terminal fragment (amino acids 660-886) to the prediction servers HHPred and Phyre2 [58,40]. Interestingly, whereas for the N-terminal DPP III fragment a typical M49 family fold was predicted as expected, the C-terminal fragment was predicted at high confidence to have an alpha-alpha superhelix fold, belonging to the Armadillo (ARM) type fold family of alkylpurine DNA glycosylase AlkD (Phyre2: confidence = 100%) (S2 Fig). HHPred also predicted two domains in PgDPP III: an N-terminal DPP III domain and a C-terminal AlkF/D domain (100% probability, E-value 2.1-7.1 E-31 for DNA glycosylase domain).
Genes coding for a fusion protein of dipeptidyl-peptidase III and an AlkD-like domain were found only within the Porphyromonas genus (Conserved Domain Database, February 2017, [59]). In Bacteroides and Prevotella genomes, on the other hand, we found these domains in close proximity, in the same order as in Porphyromonas, but as two separate genes (SyntTax, February 2017, http://archaea.u-psud.fr/synttax/).

Physicochemical and catalytic properties of the purified recombinant proteins
We first expressed and purified full-length PgDPP III protein. As we found structural similarity of the C-terminal fragment of PgDPP III with members of the DNA glycosylase family, we also produced the DPP III and C-terminal domains in isolation, for subsequent investigation.
Cloning, expression, and purification of the recombinant proteins was performed as described in the Materials and methods section. Enzyme activity was assayed using a set of synthetic dipeptide-2-naphthylamide substrates as described previously [60]. Full-length PgDPP III and the DPP III fragment (a. acids 1-679) were purified to apparent homogeneity according to SDS-PAGE, with estimated molecular weights of 102000 and 78000, respectively, which was in agreement with their predicted molecular masses (Fig 1A). We cloned and attempted to express three C-terminal fragments of different sizes. Only one (amino acids 648-886) was detected on SDS-PAGE. However, it could not be purified as all expressed protein was in insoluble inclusion bodies. This problem was not solved by attempts to yield soluble protein by refolding from inclusion bodies.
Peptidase activity assays with different dipeptidyl-2NAs showed a similar substrate specificity of PgDPP III as compared to BtDPP III and several other members of the M49 family, with a preference for Arg 2 -2NA [22]. In addition, Phe-Arg-2NA and Ala-Arg-2NA were hydrolysed with high rates (S2 Table).
Deconvolution of the CD spectra using the CDSSTR program predicted 45.6% helices, 10.8% strands, 17.0% turns and 26.8% of unordered secondary structure for full-length PgDPP III ( Fig 1B). Comparing the CD spectra of full-length PgDPP III and of the DPP III fragment, as well as thermal denaturation curves (showing a T m of 50˚C in both cases) revealed no significant difference in secondary structure or temperature stability. However, the specific activity of the DPP III fragment for the hydrolysis of Arg 2 -2NA was 250-fold lower compared to the full-length protein. We also determined the kinetic parameters of both the full-length enzyme and the DPP III fragment for the hydrolysis of the fluorogenic substrates Arg 2 -2NA and Arg 2 -AMC. As shown in Table 1, this kinetic analysis revealed pronounced differences between full length PgDPP III and the DPP III fragment. The K m values for both substrates were increased by 4-fold when the C-terminal domain was removed. A striking difference was also observed in the k cat values which were reduced 120-fold in the case of Arg 2 -AMC, and 250-fold in the case of Arg 2 -2NA, resulting in a 470-fold and 1050-fold reduction in the catalytic efficiency, respectively.

Interaction with peptides
To investigate interactions of the inactive E433A variant of full-length PgDPP III with peptides, we chose three model peptides, which enable comparison to eukaryotic DPPs III: the pentapeptides tynorphin (VVYPW) and IVYPW, and the octapeptide angiotensin II (DRVYIHPF).
We performed isothermal titration calorimetry in order to determine the binding affinity. As in previous studies [26,38], we chose the inactive variant for our experiments, in order to avoid contamination of the calorimetric data by heat or reactions resulting from peptide hydrolysis. In the case of tynorphin and IVYPW, we obtained the same endothermic mode of binding as previously described for human DPP III [26], while the binding of angiotensin II was exothermic (Fig 2). The thermodynamic parameters obtained from the ITC experiments are presented in S3 Table. It was shown that angiotensin II has stronger affinity compared to the other two peptides. Binding thermodynamics of these peptides to human DPP III have previously been investigated [26,38]. Overall, the human enzyme exhibits tighter binding to all three peptides, when compared to PgDPP III.

Hydrolytic activity towards peptides
In a first set of biotransformation experiments wild type PgDPP III (0.16 mM) was incubated for 24 h separately with the three oligopeptides (1 mM) that showed binding to the protein in ITC experiments: angiotensin II, tynorphin and IVYPW. HPLC-MS analyses confirmed that all three peptides are substrates of PgDPP III (S3 Fig). In all reaction mixtures, the original  In view of the known manner of enzymatic action of dipeptidyl peptidases, the formation of a C-terminal tetrapeptide from the octapeptide angiotensin II is most likely the result of two consecutive N-terminal dipeptide cleavages. To substantiate this assumption, we carried out a biotransformation of angiotensin II with a significantly lower concentration of PgDPP III (2 mg/mL; 0.018 mM), and we followed the reaction time course to identify potential reaction intermediates. Two interesting observations were made in this experiment (S4 Fig). Firstly, the major isomer of angiotensin II (t R = 8.1 min) is converted much faster than the minor one (t R = 8.3 min). As a consequence, the isomeric ratio drops from 23:1 at the start of the reaction to 6:1 after 1 h. After 2 h the main isomer is no longer detectable and after 4 h the minor isomer is completely consumed. Secondly, the hexapeptide VYIHPF, which is, besides the N-terminal dipeptide (DR), the product of the first enzymatic cleavage and can hence be considered a reaction intermediate, could indeed be detected (t R = 9.1 min, m/z [M+H] + = 775.4). Its concentration increased during the first 30 min of the reaction and decreased afterwards.
For comparison, angiotensin II was incubated with human DPP III and this reaction mixture was analyzed by HPLC-MS. As shown in S4 Fig, the human enzyme hydrolyzed the peptide in the same way as PgDPP III, only faster. This confirms angiotensin II is a good substrate for human DPP III. Complementation assay in the DNA-repair-deficient E. coli strain We predicted the C-terminal fragment of PgDPP III has a similar fold to AlkD, a bacterial DNA glycosylase that removes positively charged methylpurines from DNA [61].
E. coli strain BK2118 is extremely sensitive to alkylating agent methyl methanesulphonate (MMS), because it lacks both AlkA and Tag 3-methyladenine (3mA) DNA glycosylases, which makes this strain alkylation repair-defective [39]. Functional complementation of the tag alkA double mutant of E. coli with a gene expressing 3mA DNA glycosylase activity was shown to restore alkylation resistance [39]. Therefore, in order to investigate the potential DNA alkylation repair function of PgDPP III, we transformed E. coli strain BK2118 with pUC18 constructs harbouring the PgDPP III full length gene, the PgDPP III C-terminal domain, AlkD from Bacillus cereus and AlkD from P. gingivalis. Transformants were plated in different concentrations and grown on media containing different concentrations of MMS (1 mM to 5 mM) for two days. Full rescue was obtained with plasmids expressing AlkD from B. cereus or AlkD from P. gingivalis, but not with full length PgDPP III, indicating that PgDPP III does not possess alkylation repair function (S5 Fig).

Study of the structure and dynamics of the full-length PgDPP III
To obtain deeper insight into the structural and dynamical properties of PgDPP III as well as to elucidate its interactions with substrates (ligands), we used molecular modelling combined with hydrogen/deuterium exchange measurements. Molecular modeling. Comparative modelling was used to derive a model of the PgDPP III as described in the Computational methods section. The structure of the N-terminal DPP III domain was modelled using the crystal structure of DPP III from B. thetaiotaomicron as template, whereas the model of the C-terminal region (ARM domain) was derived using the structure of AlkF from B. cereus (PDB-code 3ZBO) as template.
This  Table) and the entire enzyme structure became more compact with the highest compression occurring during the first 50 ns (equilibration time; Fig 4). The separation between the two lobes of the DPP III domain decreased, as reported for the human orthologue [62], but also a reorientation of the ARM fragment relative to the DPP III domain occurred (Fig 3). However, the secondary structure within the DPP III domain, as well as the zinc ion coordination, was mostly preserved during the simulations.
A principal components (PC) analysis revealed two dominant components which explained 94% of the total variance generated during MD simulations (PC1 describes 60.6%, and PC2 33.8% of the total variance). The most prominent motion, described by their eigenvectors corresponds to the closure of the DPP III fragment and to the overall protein compression. The first eigenvector describes displacement of the outer edges of the DPP III fragment cleft accompanied with rotation of the C-terminal ARM region in the direction of the N-terminal DPP III region (Fig 5). The second eigenvector describes a parallel shift of the lower DPP III domain and the lower part of the ARM fragment. The most prominent feature is the correlated motion of the lower DPP III domain and the ARM fragment.
During the simulation, Zn 2+ was mostly hexacoordinated. The coordination was accomplished with the Nε atoms of His432 and His437, the carboxylate oxygen atoms of Glu433 and Glu460, and two water molecules. The carboxylates of both Glu433 and Glu460 coordinated the metal ion monodentantely during the entire simulation. Occasionally, an additional water molecule replaced Glu433 in the zinc coordination sphere. A typical representation of metal ion coordination during the simulation is given in Fig 6. Our previous QM/MM study on human DPP III showed that a hexacoordinated zinc ion is energetically the most advantageous type of Zn 2+ coordination in the open and semi-open conformations of hDPP III [63].
In addition to the homology model, the protein structure extracted from the 100 ns simulated PgDPP III-Arg 2 -2NA complex (simulations of the complexes are given below) was used to study the behavior of ligand free PgDPP III in water. During two 150 ns long MD simulations of this PgDPP III structure neither the protein secondary structure (see S4 Table) nor the protein compactness changed significantly (Rgyr fluctuated between 31 and 32.5 Å). Also, the zinc ion coordination was similar to the coordination during previous, 200 ns long MD simulations of the homology derived PgDPP III model (see S7 Fig).
MD simulations of PgDPP III in complex with diarginyl arylamide substrates. The PgDPPII binding site is situated deep in the interdomain cleft. It is mostly defined by the amino acid residues from the helices situated at the bottom of the DPP III upper domain, H11 (410-436) and H12(452-473) and the beta strand, E7 (Ile366-Asn372/Asp374), from the upper part of the lower domain beta core. The PgDPP III-Arg 2 -2NA complex was built using the PgDPP III structure obtained after 150 ns of the productive MD simulations of the ligand-free protein and the crystal structure of the tynorphin complex of the E451A variant of human DPP III (PDB code: 3T6B) as a template [26]. The subsequent obtained structure was equilibrated for 30 ns and two replicas of the PgDPP III-Arg 2 -2NA complex were simulated, one for 150 ns and the other for 200 ns. During these simulations, the radius of gyration fluctuated between 31.5 and 32.5 Å (S8 Fig). In the final structures, the substrate is stabilized with several strong hydrogen bonds and electrostatic interactions established with charged amino acid residues like Glu291, Glu304, Asp359, Asp374, Glu433 and Glu460 (S5 Table and S9 Fig). While the naphthylamide group mostly sat inertly in the large, partly hydrophobic pocket, where it interacts with Ala348, Ile366, Gly367 and Thr429, the arginine side chains changed their orientation during the MD simulations. Such behavior of the substrate is a consequence of the shape of the substrate binding pocket. Since the hydrophobic bottom end of the binding pocket is bottle shaped, rotations of the naphthylamide group are sterically hindered. On the other hand, the rest of the binding pocket is partially water exposed and relatively wide with the negatively charged regions dispersed around its surface (S10 Fig), which enables the positively charged arginine side chains to accommodate different orientations. It is therefore not surprising that in the final structures, the orientation of the side chain of the first Arg (from the N-terminus) is different in the two Zn 2+ was coordinated by Nε atoms of the His432 and His437 imidazoles, the Glu433 and Glu460 carboxylate oxygens and by the carbonyl oxygen of the second arginine of the substrate during the simulations of both replicas.
The PgDPP III-Arg 2 -AMC complex was built from the equilibrated PgDPP III-Arg 2 -2NA complex in a way that the naphthyl ring was replaced with AMC. It was simulated for 150 ns, during which the radius of gyration fluctuated between 31 and 32 Å.Similar to the PgDPP III-Arg 2 -2NA complex, Zn 2+ is coordinated by His432, Glu433, His437, Glu460 and the carbonyl oxygen of the second arginine of the substrate during the simulation (S11 Fig). In the final structure, the Arg 2 -AMC orientation in the enzyme active site is similar to that of Arg 2 -2NA in one (200 ns simulated) replica (Fig 8). However, the side chain of the second arginine from the N-terminus in Arg 2 -AMC is oriented differently than in Arg 2 -2NA and stabilized by an interaction with Asp374 (see S11 Fig for substrate binding and the Zn 2+ coordination).
H/D exchange of the ligand-free protein.
Earlier studies have shown that the hydrogen/ deuterium exchange (HDX) approach can reveal data on protein structure and flexibility [56,64,65]. In this work, we combined HDX results with data extracted from MD simulations in order to get detailed insight into protein structure and dynamics. The HDX results could be a good estimate of the MD simulations reliability. The results of the hydrogen/deuterium exchange experiment for every peptide at each incubation period were correlated with the results of MD simulations, where the open state of the amide hydrogen for hydrogen/deuterium exchange reaction is defined as the number of snapshots in which either NH or CO comes into contact with a water molecule and the ratio of an amide site closed/open states calculated by equation: Closed/Open = (N total -N solvated )/N solvated was used as the amide site j protection factor (PF j ) in the equation for the calculation of the peptide deuterium content D pep (equation given in Material and methods). The overall correlation of the HDX data with the simulation results starting from the homology model of PgDPP III is about 0.50, while the correlation with the longer simulated structure, namely the PgDPP III structure extracted from the simulated PgDPP III-Arg 2 -2NA complex is about 0.65 (S12 and S13 Figs). During the MD simulations in water, the PgDPP III structure becomes more compact and, according to the results of comparison with the dynamical behaviour predicted by the HDX experiment, it is more reliable than the initial, extended structure determined by homology modelling. In both cases, a better correlation (about 0.53 and 0.71, S14 Fig and Fig 9, respectively) was obtained for the DPP III than for the ARM region.
SAXS. The homology model of full-length PgDPP III was modeled into the SAXS envelope as a rigid body using the 'fit in volume data'-algorithm in Chimera (Fig 10). The molecular envelope of the protein was calculated from the scattering curves using the program DAMMIF, averaged with DAMAVER and refined with DAMMIN. According to the SAXS data, the protein is monomeric in solution and shows a molecular weight of 101 kDa (calculated using I(0) and compared to the I(0) value of BSA). The D max was found to be 113 Å with  a radius of gyration of 30 Å (deduced from a Guinier analysis). The distance distribution function p(r) calculated by GNOM points towards an elongated oval particle. The low resolution structure has been deposited into the SAXS database under the code SASDC58 [66].

Discussion
Dipeptidyl peptidases of P. gingivalis have been the subject of extensive research because these exopeptidases liberate dipeptides from the amino-end of their natural substrates, and P. gingivalis is known to utilize dipeptides preferentially, instead of free amino acids, as the source of energy [28]. Until now, beside DPP IV, which is a proven virulence factor, three other DPPs were reported: DPP5, DPP7 and DPP11 [67][68][69]. They are all serine peptidases, localized in the periplasm, but differ in their substrate specificity [15]. In addition to DPPs, gingipains R and K were demonstrated to contribute to dipeptide production from extracellular oligopeptides [15]. Investigations with gene-disrupted mutants of P. gingivalis ATCC 33277 indicated that DPPs and gingipains cooperatively liberate dipeptides from nutrient oligopeptides [67]. In their search for an unidentified enzyme possessing a DPP7-like substrate specificity, Ohara-Nemoto et al. [67] also expressed the gene PGN_1645 coding for a putative DPP III from P. gingivalis strain ATCC 33277. They examined the substrate specificity of the recombinant protein with dipeptidyl-AMC substrates and found the highest activity for Arg-Arg-AMC. This protein is very similar to DPP III from P. gingivalis strain W83, described in our study, only 20 amino acids longer (906 a. acids), with 96% identity. The same authors have also shown that gingipain null-mutant KDP136 cells had defects in Arg-Arg-AMC hydrolysis, compared to the wild-type P. gingivalis ATCC 33277, and therefore suggested that DPP III did not participate in extracellular dipeptide production in P. gingivalis. This finding indicated that P. gingivalis DPP III is not a periplasmic enzyme. However, the probable cytosolic localization of PgDPP III does not exclude its importance in protein metabolism (i.e. oligopeptide cleavage).
In this study we have characterized an atypical dipeptidyl peptidase III, the product of the gene PG_0317, using molecular biology, biochemical, biophysical, and computational chemistry methods. This 886 residues long protein contains an additional C-terminally appended domain predicted to possess the superhelical ARM-type fold.
Our investigation of the biochemical properties confirmed that PgDPP III acts as a true dipeptidyl peptidase III, cleaving dipeptides sequentially from the N-terminus of an oligopeptide. PgDPP III is capable of hydrolysing peptides of various compositions and length, like the octapeptide angiotensin II (DRVYHIPF) and its hexapeptide fragment VYIHPF, as well as the pentapeptides VVYPW (tynorphin) and IVYPD (Fig 2). Among the dipeptidyl-2NA substrates PgDPP III prefers diarginyl-2NA (S2 Table).
Kinetic analyses of the hydrolysis of the preferred substrate Arg 2 -2NA showed similarities in K m and k cat values compared to B. thetaiotaomicron DPP III [24], and pronounced differences in comparison to the yeast enzyme [70]. The catalytic efficiency (k cat /K m ) of full-length PgDPP III is 50-fold higher than that of the yeast counterpart, due to the decrease in K m (12-fold) and the increase (4.2-fold) in the k cat value. In contrast, compared to human DPP III [70], the k cat /K m value of the P. gingivalis enzyme is 12-and 26-fold lower for both Arg 2 -2NA and Arg 2 -AMC, respectively, predominantly owing to the difference in k cat values.
Interestingly, Arg-Arg-2NA is a much better substrate for PgDPP III than Arg-Arg-AMC (2.6-fold higher k cat , and 6.8-fold higher catalytic efficiency) ( Table 1). Human DPP III also prefers Arg 2 -2NA over Arg 2 -AMC, but entirely on the account of a change in the K m value [70].
Due to the high sequence similarity with B. thetaiotaomicron DPP III, we were able to obtain a homology model of PgDPP III and study the structure and dynamics of the ligandfree enzyme as well as of complexes with the substrates Arg 2 -2NA and Arg 2 -AMC, using computational methods.
MD simulations revealed the difference between Arg 2 -2NA and Arg 2 -AMC binding into the enzyme active site (Fig 8, S9 and S11 Figs), which is reflected in different intensity and persistence of some important hydrogen bonds (e.g. the interaction with Glu433, which according to our previous study on hDPP III takes part in the catalytic reaction [71], from the active-site motif H 432 ECLGH 437 is missing in Arg 2 -AMC-PgDPP III complex; S5 Table).
Although we expected the separately expressed DPP III domain of PgDPP III to have activity similar to that of the full-length protein, it showed an approximately three orders of magnitude weaker catalytic efficiency for the hydrolysis of both diarginyl arylamide substrates (Table 1). A possible explanation for the importance of the C-terminal domain for the peptidase activity is our finding that the C-terminal fragment and the DPP III lower domain motions are correlated. Our previous structural studies of human DPP III (737 amino acids) have shown that the ligand-free enzyme fluctuates between an elongated protein molecule with two domains separated by a wide cleft and more compact forms with the upper and lower lobe closer to each other [26,72]. Peptide binding boosts the closure of the binding site and shifts the protein structure to the highly compact form [62] which is prerequisite for the catalytic activity of human DPP III [71] and probably for the other members of the M49 family activity as well since they share similar protein folds [24,25]. The present MD simulations of full-length PgDPP III suggest that its N-terminal DPP III region adopts a relatively compact form in solution, similar to the so called 'semi closed form' reported for the human orthologue (S4 Table). In addition, the simulations showed a significant reorganization in the C-terminal ARM region, that correlated with a motion of the lower DPP III domain and an increase of the compactness of the whole enzyme structure (Fig 3). Therefore, we suppose that the C-terminally appended domain influences the interdomain dynamics of the DPP III region as well as peptide binding. A comparison between the MD simulations and the experimental HDX results revealed that PgDPP III adopts a more compact structure in solution than predicted by homology modeling, with the motion of the lower domain of the DPP III fragment being highly correlated with reorganization of the C-terminal ARM fragment.
The ARM-type fold is multi-helical and comprised of two curved layers of alpha helices arranged in a right-handed superhelix [73]. Domains and repeats with an ARM-like fold are found in a number of different proteins involved in various important cellular processes [74]. The ARM repeat fold was also found in the C-terminal domain of aminopeptidase B and the bifunctional enzyme leukotriene A4 hydrolase, as well as in aminopeptidase O, a human brain metallopeptidase, all members of the M1 family, which is characterized by the zinc-binding motif HEXXHX 18 E [75]. To our knowledge, the function of the ARM domain in these aminopeptidases has not been investigated in detail.
Although the complementation assays in a DNA-repair deficient Escherichia coli strain indicated the absence of any alkylation repair function in PgDPP III, the presence of an appended ARM repeat domain at the C-terminus of PgDPP III indicates multifunctionality of this protein and opens new avenues for future research.