The Extended Cleavage Specificity of Human Thrombin

Thrombin is one of the most extensively studied of all proteases. Its central role in the coagulation cascade as well as several other areas has been thoroughly documented. Despite this, its consensus cleavage site has never been determined in detail. Here we have determined its extended substrate recognition profile using phage-display technology. The consensus recognition sequence was identified as, P2-Pro, P1-Arg, P1′-Ser/Ala/Gly/Thr, P2′-not acidic and P3′-Arg. Our analysis also identifies an important role for a P3′-arginine in thrombin substrates lacking a P2-proline. In order to study kinetics of this cooperative or additive effect we developed a system for insertion of various pre-selected cleavable sequences in a linker region between two thioredoxin molecules. Using this system we show that mutations of P2-Pro and P3′-Arg lead to an approximate 20-fold and 14-fold reduction, respectively in the rate of cleavage. Mutating both Pro and Arg results in a drop in cleavage of 200–400 times, which highlights the importance of these two positions for maximal substrate cleavage. Interestingly, no natural substrates display the obtained consensus sequence but represent sequences that show only 1–30% of the optimal cleavage rate for thrombin. This clearly indicates that maximal cleavage, excluding the help of exosite interactions, is not always desired, which may instead cause problems with dysregulated coagulation. It is likely exosite cooperativity has a central role in determining the specificity and rate of cleavage of many of these in vivo substrates. Major effects on cleavage efficiency were also observed for residues as far away as 4 amino acids from the cleavage site. Insertion of an aspartic acid in position P4 resulted in a drop in cleavage by a factor of almost 20 times.


Introduction
Proteases are essential for a large number of important biological processes such as fertilization, blood clotting, food digestion and immunity, where they constitute approximately 2% of the total human proteome [1]. A key to the regulation of these processes is their ability to select the correct targets among a myriad of substrates. This is made possible by the specific recognition of substrate sequences containing typically 7-8 contiguous amino acid (aa) residues [2]. Some proteases are highly specific, having relatively strict preferences for the majority of these 7-8 aa and therefore only cleave a few selected targets, whereas others cleave almost any substrate with the preferred aa in the P1 position, i.e. adjacent to where the peptide bond is cleaved. Experimental identification of the recognition sequences adds very important information about a protease's biological function, facilitates the identification of proteases for site-specific proteolysis, provides a basis for the design of good substrates for kinetic studies and helps in the design of efficient inhibitors. There is also considerable medical interest in proteases, with an estimated 14% of all human proteases being investigated as potential targets in drug development [3].
Thrombin is arguably the most extensively studied of all human proteases. It is a serine protease with essential functions in blood coagulation and in numerous other regulatory processes. Known natural substrates for thrombin include coagulation factors V, VIII, XI and XIII, protein C and fibrinogen [4]. It also activates platelets via cleavage of protease-activated receptors (PAR) -1, -3 and -4. Interestingly, thrombin regulates the coagulation process both positively, by cleaving prothrombin, FV and FVIII and negatively, by cleaving protein C (reviewed in [4,5,6]). Due to its vital importance, the substrate recognition profile of thrombin has been studied in detail since the early 1980s [7,8]. Various techniques have been used, including chromogenic peptide substrates and combinatorial methods using libraries of substrate peptides with fluorogenic leaving groups or fluorescence-quenched substrates (see Table 1) [9,10]. These studies have shown a strong preference for arginine in position P1 and for proline in position P2 [7,8,11,12,13,14]. Aliphatic aa have been seen to be preferred in position P4 [14,15]. Position P19 almost always has serine, threonine, glycine or alanine [14,16,17]. Aromatic aa are favored in position P29 [18,19], and basic residues in position P39 [18,19,20]. Acidic residues are avoided, especially in positions P3 and P39 [13,14,17]. These studies, which are summarized in Table 1 have resulted in a relatively detailed picture of the cleavage specificity of thrombin. However, there are limitations with these studies. The preferences for aa N terminally or C terminally of the cleavage site have been determined separately. In other studies, one or several positions have been fixed or only a limited number of combinations have been tested. Interactions depending on subsite cooperativity are subsequently and easily overlooked. To overcome these problems we have now determined the extended cleavage   Letters in bold indicates investigated positions, residues that were held constant are in parentheses. The preferred amino acids are denoted in the order of preference. Equally favorable residues are indicated by the absence of a slash (/). n.d., not determined; -, not applicable; pNA, para-nitroanilide. doi:10.1371/journal.pone.0031756.t001 specificity of thrombin using phage substrate display technology. This method utilizes a library of approximately 5610 7 bacteriophages [21] where one capsid protein displays a randomized, individual oligopeptide sequence coupled to a six histidine purification tag. Protease-susceptible oligopeptide sequences are identified and amplified, usually in five rounds of selection, so that all final sequences have been selected by the protease of interest during five different occasions. The competition of suitable targets at a low concentration with countless non-substrate molecules for access to the active site probably closely resembles in vivo situations. Phage display allows the simultaneous investigation of primed and non-primed substrate positions, and can inform about subsite cooperativity. Compared to the analysis of individual peptides, which is also sensitive to subsite cooperativity, phage display has the advantage that numerous sequences can be investigated in a short time. Other advantages include that phage display is virtually unbiased, works independently of the P1 specificity and tolerates big variations in the degree of selectivity. It is only when proteases requiring a three-dimensional substrate structure that is not provided by phage-displayed peptides the method may fail [22].
In this communication we present a detailed analysis of the extended cleavage specificity of the active site of human thrombin, minimizing the influence on cleavage specificity by long-distance exosite interactions. This analysis conforms very well to the previously observed preferences, as summarized in Table 1. In addition, the phage display results suggest a cooperative or additive effect between subsites P2 and P39. A comparison between the consensus sequence and a panel of known in vivo substrates also showed that no natural substrates display the consensus sequence but represent sequences that show only 1-30% of the optimal cleavage efficiency for thrombin. This very interesting finding indicates that maximal cleavage, in the absence of exosite interactions, is not always desired but instead may cause problems with excessive or dysregulated coagulation. A low cleavage rate of the selected sequence may be strongly enhanced by strong and specific exosite interactions.
Moreover, we present a screening of the human proteome for potential novel thrombin targets using the derived consensus cleavage motif, Pro-Arg-[AlaGlySerThr]-[not AspGlu]-Arg (i.e. P-R-[AGST]-[not DE]-R). A list of 73 such potential targets is presented where the majority are involved in cell adhesion, the nervous system, development/differentiation and circulatory homeostasis. Some of them may prove to be novel important targets for this multifaceted enzyme.

Results
Phage display analysis of the extended cleavage specificity of human thrombin A library of T7 phage-displayed nanomer peptides was subjected to five rounds of selection with 0.2 U or 1 U of human thrombin (1.5 and 7.5 nM of thrombin) [21,33,34]. This library contains approximately 5610 7 independent bacteriophages [21].
The ratio of phages released in thrombin-treated samples compared to the PBS control increased steadily with each selection round, reaching 240 in samples with 1 U thrombin and 136 with 0.2 U of thrombin after five selection rounds (data not shown). Seventeen or eighteen DNA sequences coding for cleavage-susceptible peptides were sampled from plaques representing selection with 0.2 U or 1 U of thrombin, respectively. All sequences were aligned to the most frequently observed pattern of at least four aa, i.e.
[other]-[basic]-[small hydrophobic]-X-[basic] (Fig. 1A, 1F). The consensus could then be refined to P-R-[AGST]-[not DE]-R. This consensus closely reflects the collected results from thirteen previous studies (Table 1). The strongest preference was observed for the first arginine in the consensus, which is therefore likely to represent the P1 position as determined from the phage display results. This is in accordance with previously established data (see Table 1).
Notably, we retrieved seven inserts from thrombin-selected phages where the sequence flanking the random nonapeptide amino-termini was mutated to encode Leu-Thr-Pro-Arg-Gly instead of Leu-Thr-Pro-Gly-Gly (''!'' in Fig. 1). Five of these sequences have arginine in position P39, in accord with the refined consensus. We have never before observed mutations in the nonrandomized region of selected peptides [21,28,33]. The retrieval of these sequences in the present study demonstrates that even very infrequent sequences that represent good substrates can be recovered by phage display.
Amino acid prevalence in positions P4 to P49 as derived by phage display conforms with natural substrates and previous studies Based on our alignments, we analyzed the prevalence of aa in each single position (Fig. 2) and, as stated above, thrombin's longknown requirement for arginine in position P1 was reproduced [8,9,12,35]. In position P2, the well-established proline [7,8,11,12,36,37] dominated (71%), but also aliphatic aa were tolerated. P2 glycine, valine or isoleucine were together present in 23% of the sequences. Although several earlier studies report similar findings, recent studies have mostly focused on P2 proline (see Table 1). However, aliphatic P2 residues are present in a number of natural thrombin substrates (Fig. 1C), including fibrinogen Aa and Bb, two cleavage sites in factor V (R737 and R1573), and PAR-3.
Position P3 was not very restricted, but excluded negatively charged aa. The most frequent residues here were glycine (29%), threonine (23%) and arginine (17%) (Fig. 2), three aa with differing biochemical and structural characteristics. A broad specificity as well as an exclusion of acidic residues, has previously been observed for position P3 [8,13,14,15]. Intriguingly, several natural substrates have acidic P3 residues, e.g. factor VIII (site R759), PAR-1 (site R41), rat fibrinogen a/a-E and protein C. The negative contribution of the acidic residue may here be compensated for by exosite interactions. In line with this view, a synthetic peptide corresponding to protein C residues P7 to P59 is in itself a poor thrombin substrate [13].
A more restricted preference was found in position P4, with aliphatic glycine or leucine in 31% or 37% of the sequences, respectively. This is in accordance with previous reports [14,15]. Furthermore aliphatic P4 residues are frequently found in natural substrates ( Fig. 1 and [12]).
In position P39, we observed a strong preference for arginine (69%), similar to results obtained with fluorescence-quenched substrates [18,19]. The possibility of subsite cooperative effects involving position P39 are discussed below.
Position P49 was quite unspecific. The five most frequent aa were valine (20%), leucine (14%), serine (14%), arginine (14%) and histidine (9%). Aliphatic residues are frequent in the P49 position of natural substrates, but P49 is probably not a major specificity determinant. One previous study including the P49 position also found a broad tolerance of aa [14].

Phage display results indicates an arginine in position P39 is important in substrates lacking proline in position P2
After aligning the phage-displayed peptides, we analyzed the representation of the consensus within the single sequences. Interestingly, we observed that all thrombin-susceptible peptides with residues other than proline in position P2 hold arginine/ lysine in position P39, whereas this is the case in only 64% of the peptides with a P2 proline (Fig. 1). This indicates that binding of substrate residues P2 and P39 to their thrombin subsites may be partially interdependent (subsite cooperativity). Natural substrates where P2 is not proline, such as fibrinogen Aa and Bb, factor V (site R737), PAR-1 (site R25) and PAR-3 hold arginine in position P39, whereas most substrates with proline in position P2 do not hold arginine in the P39 position (Fig. 1C). The phage display results indicate that P2 proline and P39 arginine are not mutually exclusive. Rather, the absence of an advantageous P2 residue, proline, in some substrates seems to be compensated for by the presence of an advantageous P39 residue, arginine.

Verifying the consensus sequence by the use of a new type of recombinant substrate
In order to verify the results from the phage display analysis and to estimate the importance of individual aa positions for the rate of cleavage, a new type of recombinant substrate was developed. The consensus sequence obtained from the phage display analysis was inserted in a linker region between two E.coli thioredoxin molecules. A number of mutations in individual aa positions from this consensus sequence, the cleavage sites of a few in vivo substrates, and a few unrelated substrate sequences were also produced with this system. This was achieved by ligating the corresponding oligonucleotides into the BamHI/SalI sites of the vector ( Fig. 3A and Table 2). All of these substrates were expressed as soluble proteins and purified to obtain a protein with a purity of IGFBP, insulin-like growth factor-binding protein; PAR, proteaseactivated receptor. The cleavage site of thrombin in the natural substrates listed in panel C is numbered from the N terminal of the prepro protein, from the first methionine. This list of natural substrates is a selection of a few of the most well known substrates of this enzyme. However, the list of potential in vivo substrates is much longer and includes many other proteins such as protein S, TAFI, antithrombin, heparin cofactor II and nexin I. doi:10.1371/journal.pone.0031756.g001 90-95%. These recombinant proteins were then used to study the preference of human thrombin for the different sequences (Figs. 3 and 4). The same concentration (18 mM) of all substrates was used in all experiments to obtain quantitative measurements of relative cleavage rate between the different sequences. The same concentration of thrombin (9 nM) was also used in all experiments except in two instances. When studying the cleavage of a few poor substrates for thrombin where we also determined the cleavage of the same amount of substrate, three or ten times more of the enzyme was added (Fig. 3). In most experiments the ratio substrate to protease was therefore approximately 2000.
Thrombin was found to very efficiently cleave the consensus sequence (LTPRGVRL). By changing the proline residue in the P2 position of the thrombin consensus sequence into a valine, the second most preferred aa, based on the phage display result, (LTVRGVRL)) the efficiency of cleavage by thrombin dropped by a factor of approximately 20 ( Fig. 3B and 3D). By changing the arginine residue in the P3 position of the thrombin consensus sequence into a leucine, also the second most preferred aa based on the phage display result, (LTPRGVLL)) the efficiency of cleavage by thrombin dropped by a factor of 10-15 ( Fig. 3B and 3D). Altering both the proline residue in the P2 position and the arginine in position P39of the thrombin consensus sequence into a valine and leucine respectively, (LTVRGVLL)) the efficiency of cleavage by thrombin dropped by a factor of 200-400 ( Fig. 3B and 3D). These results show the major importance of these two residues in conferring the substrate specificity of thrombin.
When analyzing the phage display results in detail, we also observed that no aromatic aa are present in position P19. This position was relatively unspecified with approximately equal representation of four different aa glycine (29%), alanine (29%), serine (29%) or threonine (14%). A tryptophan was inserted in the P19position (LTPRWVRL) and tested for efficiency in cleavage. No cleavage of this substrate was observed, indicating that no large bulky aa is tolerated in this position. Similarly, in position P3 we did not observe any aromatic aa and only one example of an aromatic aa is found for this position in the natural substrates listed in Figure 1 (Factor V (R1573)). A substrate was produced where tryptophan was introduced in position P3 instead of the preferred threonine (LWPRGVRL). However, this mutation had no effect on the cleavage rate (Fig. 3C).
A lack of negatively charged aa in position P29and P39 has been observed. Therefore a mutant where an aspartic acid was inserted in position P29 was tested (LTPRGDRL). This substrate showed a reduction in cleavage by approximately 15 times compared to consensus. The effect of this mutation was almost as severe as mutating the proline in position P2 and as severe as mutating arginine in the P39position. In contrast, introducing an aspartic acid in position P49 (LTPRGVRD) had only a minor effect on the rate of cleavage, by a factor 2-3, compared to the consensus (Fig. 3C).
A number of additional substrates were also included in this study. The optimal sequence for cleavage by the human mast cell chymase (HC) and the opossum mast cell chymase (OC) have recently been determined [29,31]. When analyzing the cleavage of these two sequences (VGLWLDRV and VVLFSEVL) we observed that human thrombin leaves these sequences completely  Figure 1A and 1B. For clarity, amino acids are displayed in functional groups, starting to the left with aromatic residues, and ending with acidic residues to the right. doi:10.1371/journal.pone.0031756.g002 Figure 3. Analysis of the cleavage specificity by the use of new types of recombinant protein substrate. Panel A shows the overall structure of the recombinant protein substrates used for analysis of the efficiency in cleavage by thrombin. In these substrates two thioredoxin molecules are positioned in tandem and the proteins have a His 6 -tag positioned in their C termini. The different cleavable sequences are inserted in the linker region between the two thioredoxin molecules by the use of two unique restriction sites, one Bam HI and one SalI site, which are indicated in the bottom of panel A. Panels B to E shows the cleavage of a number of substrates by thrombin, where individual amino acids has been changed from the thrombin consensus sequence. The name and sequence of the different substrates are indicated above the pictures of the gels. The time of cleavage (in minutes) is also indicated above the corresponding lanes of the different gels. The uncleaved substrates have a molecular weight of approximately 25 kDa and the cleaved substrates appear as two closely located bands with a size of 12-13 kDa. doi:10.1371/journal.pone.0031756.g003 untouched even after using 10 times more enzyme (Fig. 3E). This result shows the high selectivity in substrate selection by thrombin.
Following the initial screening we felt that these results were so interesting that we decided to extend the analysis to a number of additional substrates. From the phage display data we had observed an almost complete lack of negatively charged aa in all eight aa positions surrounding the cleavage site. Therefore an aspartic acid residue was placed in various positions in the substrate. The insertion of an aspartic acid in the P4 position (DTPRGVRL) showed surprisingly a major effect on cleavage, a drop in efficacy by a factor 20-30 (Fig. 4A). Interestingly, insertion of an aspartic acid in position P3 (LDPRGVRL) had a much less pronounced effect. Here efficacy dropped by a factor 2-3 (Fig. 4A). However, the insertions of an aspartic acid in positions P2 or P19 had dramatic effects (LTDRGVRL and LTPRDVRL)). An almost complete lack of cleavage was observed ( Fig. 4A and 4B). Insertion of an aspartic acid in the P39position also showed a marked effect on cleavage, by a factor of approximately 20-30 times that of the consensus sequence (Fig. 4B).
None of the substrates obtained from the phage display analysis had lysine in the P1 position. However one in vivo substrate, PAR-3, has been shown to have a lysine in this position (Fig. 1C). In order to test the selectivity for arginine over lysine (both positively charged aa) we exchanged arginine for lysine in one of the synthetic substrates (LTPKGVRL) (Fig. 4B). Interestingly the analysis of the cleavage rate of this substrate showed that arginine is preferred over lysine by a factor of approximately 10 (Fig. 4B), indicating a relatively high selectivity for arginine in the P1 position.
From the previous analysis we had seen that introduction of an aromatic aa in thee P19 position completely blocked cleavage, therefore we decided to test other aa substitutions in this position. Introducing an aspartic acid in this position also completely blocked cleavage, whereas a leucine in this position (LTPRLVRL) showed an approximate 10-fold reduction in cleavage (Fig. 4C).
Insertion of a tryptophan in position P29, instead of the consensus valine, (LTPRGWRL) had no or even a minor positive effect on the rate of cleavage.
When we compared the consensus sequence obtained from the phage display analysis with the list of natural in vivo substrates presented in Figure 1C, we observed that no in vivo substrate corresponded to the consensus sequence. Interestingly, the natural substrates appeared to be relatively poor substrates for human thrombin. In order to substantiate this conclusion, we selected four relatively different in vivo substrates and produced recombinant substrates containing the eight aa region spanning these three cleavage sites. The first substrate tested corresponded to arginine 211 in protein C (substrate number 16 in Figure 4C). This sequence (VDPRLIDG) was found to be a very poor substrate for thrombin. In our analysis we could not detect any cleavage after 150 minutes, which shows that it is 1% or below of the cleavage of the consensus substrate. The second in vivo substrate that was analyzed was the region of arginine 327 in prothrombin (FNPRTFGS). This substrate showed relatively good cleavage (Fig. 4D). However, only 25-30% cleavage compared to the consensus substrate was observed (Fig. 4D). The third in vivo substrate (substrate 18) was another cleavage site within prothrombin. This site, which corresponds to the region around arginine 200 (MTPRSEGS) was a poor site for thrombin cleavage. This site was 20-30 times less efficient than the consensus site (Fig. 4D). The fourth in vivo site that was studied was the region corresponding to arginine 35 in fibrinogen A alpha (GGVRGPRV). This site was also a very poor site for thrombin. Similarly to the protein C substrate no cleavage could be detected even after 150 minutes, again showing it is 1% or less than the activity of the consensus site (Fig. 4D). All these four ''in vivo'' substrates were cleaved at less than 25-30% efficiency compared to the consensus substrate ( Fig. 4C and 4D). This confirmed our conclusion based on the phage display analysis that most natural in vivo substrates are relatively poor substrates for human thrombin when presented as linear peptides. Long range exosite cooperative effects here may help in increasing the local concentration of the substrate and thereby increasing the rate of cleavage. The data from the recombinant substrate analysis has been summarized in Table 2.

Novel candidate substrates for thrombin identified by PROSITE search
Known natural thrombin substrates mostly align to only three or four positions in the consensus recognition sequence, P-R-[AGST]-[not DE]-R. Thus, database searches with the full consensus may identify novel potential thrombin substrates. We searched the Swiss-Prot, TrEMBL and PDB databases for human (H. sapiens) proteins holding the P-R-[AGST]-[not DE]-R motif, yielding 651 hits in 602 protein sequences. In at least 75 proteins, the motif was extra-cellular or secreted (Tables 3, 4, 5 and 6) and therefore potentially accessible for thrombin. Interestingly, a total 73 of these proteins seem involved in one or several of four areas including, cell adhesion, the nervous system, development and differentiation or circulatory homeostasis (Tables 3, 4, 5, and 6). More specifically, 36 proteins (48%) have been implicated in cell adhesion. Among these are eight collagen variants and integrin aV, i.e. central components of the extra-cellular matrix (ECM). Thirty-three proteins (44%) have roles in the nervous system, including three roundabout homologs and persephin, all of which are implied in neurotropic activity. Thirty proteins (40%) are involved in development/differentiation, and eleven (14.7%) in circulatory homeostasis.

Discussion
Substrate phage display technology has made it possible to elucidate the substrate recognition profile of human thrombin from position P4 to P49, completely and simultaneously. Compared to previous studies the profile obtained increases the detail substantially but also conforms relatively well with results presented in reports during the past 25 years (see Table 1). It should also be pointed out that only two previous studies report results for eight positions [14], but P1 to P4 and P19 to P49 were, in one of these studies, investigated separately and with two different approaches. Moreover, distinct subsites were held constant or not investigated, which effectively prevents the analysis of subsite cooperativity effects. The results for positions P29, P39and P49 reported from that work were also less specific than with phage display. The second study is a phage display analysis that was performed on human thrombin and on thrombin in combination with thrombomodulin or hirugen [39]. However, only a diagram on residue preferences was included; there are no original data about individual clones. Furthermore, the high variability in the analysis made it difficult to draw any conclusions for a potential consensus cleavage site [39]. When we compare our phage display data with previous investigations the strong preferences for arginine in position P1 [8,9,12,35] and for proline in position P2 [7,8,11,12,36] was reproduced. Position P3 was found to be rather unspecific. However, negatively charged aa have previously been reported not to be tolerated in this position [13]. In contrast to this finding we observe a remarkable high tolerance for most aa acids in this position including negatively charged aa ( Fig. 1 and Fig. 4). Position P4 featured preferentially hydrophobic aa, namely leucine, which has been the most frequent aa reported by several studies as well as ours [14,15]. Here we can report a remarkably high degree of specificity. Negatively charged aa are apparently not tolerated in this position as we see a drop in cleavage by 20-30 times by introducing an aspartic acid in this position (Fig. 4). For position P19, our study and two others [14,17] have consistently identified serine, alanine, glycine and threonine to be by far the most preferred residues. Aromatic aa were the most preferred in position P29. Le Bonniec and Marque with coworkers have reported phenylalanine to be most preferred [18,19] and we indicate tryptophan is the most prevalent with phenylalanine as the third most frequent aa. In position P39, our data along with others, found arginine to be most preferred [19,20]. Position P49 has previously not been extensively investigated, but seems to display a preference for aliphatic residues.
Substrates aligning to the consensus in positions P1, P19and P29 may obtain sufficient affinity to thrombin by either a proline in the P2 position (which is well-documented), but also in the absence of P2 proline by an arginine in position P39. However, the thrombin consensus recognition sequence does not indicate this, because most experimental thrombin substrates obtained by phage display hold both proline in position P2 and arginine in P39. The cooperativity effects involving position P39 have previously also been indicated from mutagenesis studies of single peptides [13,17]. The second position involved in that study was P3, but position P2 was held constant in these studies, and several cooperativity mechanisms may exist. Several important physiological thrombin targets do indeed lack proline in position P2 and hold arginine in P39, e.g. fibrinogen chains Aa and Bb, factor V, PAR-1 and PAR-3 (Fig. 1C). However, the preferred and activating PAR-1 cleavage by thrombin is LDPR-SFLL holding P2 proline. The P39 arginine in fibrinogen Aa has great biologic significance, as illustrated by the fact that replacement of this residue with glycine, serine or asparagine leads to bleeding disorders [20]. This underlines that knowledge of subsite cooperativity effects can be medically very important. Subsite cooperativity is difficult to study experimentally but probably exists in many enzymes. A review on subsite cooperative effects in proteases, that summarizes the available information concerning this interesting phenomena, has recently been published [39]. A detailed study on the cleavage specificity of factor Xa by Bianchini et al from 2002 also comes to the conclusion that the efficiency in cleavage by factor Xa is primarily a result of exosite interactions and not the specificity of the active site [40]. This article also contains a detailed study of the cleavage specificity of human thrombin. In this article thrombin was used as a reference compound for the analysis of Factor Xa. By using a large panel of fluorescence-quenched substrates they mapped the cleavage specificity of thrombin between P3 and P39residues [40].
Their results are very similar to results we obtain for this region by phage display. For example, they did see that the P3 position is relatively unspecific with a slight preference for methionine, threonine and arginine. In the P2 position they see that proline is the most preferred aa by almost one order of magnitude higher activity of this substrate than with leucine or valine in this position. In the P19position they identify serine, alanine, glycine and threonine as the four most preferred aa. This is in full agreement with our data and with data from several other labs. However, in contrast to our results they find that phenylalanine is also well tolerated in this position [40]. In the P29position they observe a preference for aromatic aa, which is in agreement with our results. The strong preference for arginine in the P39position is also identical between the two studies [40]. Interestingly none of the in vivo substrates identified (listed in Figure 1) have both the P2 proline and the P39arginine of the consensus site. This may indicate that suboptimal cleavage sites are preferred over the consensus site. The sequences of the in vivo substrates show a cleavage that is 10-100 times less efficient than the consensus site. The analysis of four selected ''in vivo'' sites also substantiate this conclusion by showing that these four sites were only cleaved at an efficacy of 1-30% of the consensus site ( Fig. 4C  and 4D). This is a finding that may seem puzzling. However, a too efficient cleavage may potentially cause excessive coagulation and a risk of unwanted thrombosis. A similar situation has been observed in the skin where the serine proteases kallikrein 5 and 7 are present in a region where the pH is suboptimal for maximal cleavage [41]. Both of these enzymes have a slightly basic pH optimum, whereas the outer layer of the skin has an acidic pH. By artificially increasing the pH, as for example by the use of neutral or alkaline soaps, the serine protease activity increases [42]. This increase in protease activity leads to premature degradation of corneodesmosomes, inactivation of b-glucocerebroside and acidic sphingomyelinase and subsequent impairment of epidermal barrier function [41,42]. Here the protease activity has to apparently be kept under suboptimal conditions as to not cause tissue damage.
A second and more likely explanation is that long distance subsite cooperative effects play a major role in determining the specificity [43,44]. It is well known that the specificity of the interaction between thrombin and its substrates stems not only from the interaction of the substrate with the catalytic subsites from S4 to S49, but also from interactions with the anion binding exosites I and II, also called the fibrinogen-recognizing exosite and the heparin-binding exosite, respectively. These two exosites are positively charged domains that flank the active site. These sites interact with negatively charged regions of the substrate. Such interactions may thereby compensate for lack of direct strong interactions with the active site. There is even indications that exosite interactions may be the major source of substrate  specificity for some targets [44]. The cleavage of protein C by thrombin does, for example, increase by approximately 1500 fold by interaction with thrombomodulin and by approximately 10 000 fold by interaction with thrombomodulin in the presence of phospholoipid membranes [43]. The mechanisms of this cooperative effect is not fully known but it is unlikely that the exosite interaction has any dramatic effect on the specificity of the active site of thrombin. Instead, the effect of the exosite is probably primarily in recruiting the substrates. A potent such recruitment effect may result in an increase in the local concentration of the substrate and a more efficient cleavage. The list of potential natural substrates for thrombin in Figure 1 is by no means a complete list. Many other potential in vivo sites have been identified. For example, three sites identified in the thrombin sensitive region of protein S (R49, R60 and R70) also have sequences that indicate that they are far from optimal sites for thrombin (VCLRSFGT, TAARQSTN, PDLRSCVN) [45], One important cleavage site for thrombin in thrombin-activable fibrinolysis inhibitor (TAFI) is arginine 302 [46]. This site (SYTRSKSK) also markedly differs from the consensus site. It thereby appears as if the absolute majority of the identified in vivo sites for thrombin are relatively poor sites for this enzyme and that other interactions probably play a major role in determining the efficiency in cleavage.
The availability of the substrate site also probably has a major impact on the efficiency of cleavage. If the site is exposed, in an accessible surface loop, it may be efficiently cleaved. However, if the site is located in a region where thrombin has difficulties in contacting the site, the cleavage may be very inefficient irrespective if the consensus sequence is present. The recombinant substrates used in this analysis have the cleavable sequences in an accessible conformation, which enables an unbiased comparison of the cleavage specificity, whereas the natural substrates may vary considerable in accessibility. This has to be taken into consideration when comparing the efficiency in cleavage of the different natural substrates. However, the consensus cleavage site for thrombin is relatively heavily charged with two arginines as well as having a proline, which introduces a bend in the peptide chain indicating that most consensus sites are probably exposed on the surface of the potential target molecules.
The phage display analysis also resulted in several additional findings concerning important restrictions in aa tolerated in various positions of the cleavage site for human thrombin. For example, the complete lack of aromatic aa in the P19position of the substrates was very interesting. The aa in this position seems to be of major importance as very little variation is observed among all identified in vivo substrates and no aa other than glycine/alanine/ serine and threonine was seen among the substrates obtained in the phage display analysis (Fig. 1). The only in vivo sites for thrombin that avoid this pattern are protein C and coagulation factor XI, which in this position have a leucine or an isoleucine, respectively (Fig. 1). Introduction of a tryptophan in this position of the consensus sequence resulted in a complete block in cleavage, which shows the importance of this position in substrate selection. Introducing an aspartic acid in this position also completely block cleavage whereas a leucine resulted in a reduction in cleavage rate by a factor 10 (Fig. 4C). Large aromatic aa are apparently not tolerated, possibly except phenylalanine [40], and negatively charged aa also severely effect cleavage in this position. However, aromatic aa are tolerated, and even potentially favored, in other positions, such as in the P29 position ( Figs. 1 and 4C).
Among the sequences originating from the phage display analysis we see an almost complete lack of negatively charged aa in all eight positions from P4 to P49 (Fig. 1). Introduction of an aspartic acid into position P2 and P19 resulted in complete block in cleavage and in the P29position aspartic acid resulted in a drop in cleavage rate by approximately 15 times. Interestingly, the introduction of this negatively charged aa in the P4 position, which is relatively far from the actual cleavage site, resulted also in a dramatic drop in cleavage rate by a factor 20-30 (Fig. 4A). These findings indicates that introduction of a negatively charged aa in almost any position, maybe except the P49 and the P3 positions, are accompanied with severe effects on the cleavage rate. The marked negative effect of introduction of an aspartic acid in the P4 position of substrates also shows that positions relatively far from the cleavage site can be of major importance for efficient cleavage. The kinetic parameters of an enzyme that has been established using short synthetic substrates, where the substrates normally are only three or four aa long and either lack all aa N or C terminal of the cleavage site should be considered only qualitatively. Such kinetic parameters have probably little relevance to the actual in vivo situation with natural protein substrates.
The complete lack of cleavage of the P19 tryptophan mutant and the two mast cell chymase sites, even after prolonged cleavage, also shows the very high specificity displayed by human thrombin. This is in marked contrast to many other serine proteases, for example the mast cell chymases, which after prolonged cleavage shows activity towards a relatively broad range of substrates [29,30,47].
A relatively detailed picture of all the eight positions that may contribute to the substrate specificity of human thrombin has now been obtained by the phage display analysis in combination with the recombinant substrates. These results may also promote the identification of novel substrates, for example by contributing consensus motifs and individual cleavage-susceptible sequences for database searches. Our ProSite search with the refined consensus P-R-[AGST]-[not DE]-R has lead to the identification of 73 potential novel targets (Tables 3, 4, 5, and 6). These group in the fields of cell adhesion, the nervous system, development/ differentiation and circulatory homeostasis. However, when evaluating these new potential substrates we need to keep in mind the possibility that numerous potential targets may be missed, due to that most of the in vivo sites that so far have been identified are not consensus sites. However, a very broad screening gives too many potential sites to be able to handle in this type of analysis.
Thrombin has previously been shown to digest numerous ECM components, such as nidogen, fibronectin, laminin and type V collagen [48,49,50]. The potential novel substrate integrin aV is therefore especially interesting, because it is a receptor for several previously known thrombin substrates, including prothrombin, fibronectin and laminin. The digestion of these central ECM molecules by thrombin may have important medical implications, for example in preventing the metastatic crossing of the ECM by tumors, and in wound healing.
The idea that thrombin may play an important role in the development and function of the brain has previously been indicated from the fact that the protease is expressed in various brain regions, especially during development and in regions exhibiting plasticity [51]. Thrombin has also been implicated in pathologic brain conditions, including adverse processes following CNS injury [52,53,54,55,56], and may influence the direction of neurite outgrowth [57]. Indeed, several novel candidate substrates have neurotropic functions, such as Roundabout homologs 1, -2 and -3 and persephin. Although the described target for many processes in the nervous system by thrombin is PAR-1 [5,53,56,58,59,60], alternative targets may very well exist.
In summary, the use of substrate phage display technology in combination with the newly developed recombinant substrates has made it possible to determine the substrate recognition profile of the active site of thrombin, from position P4 to P49, completely and simultaneously. This study has also resulted in a very detailed picture concerning kinetics, in relative terms, on the contribution by individual residues on cleavage specificity. The combination of these two techniques has made it possible to study the specificity of the catalytic site excluding interactions depending on exosite interactions. The obtained profile conforms very well with previous studies, and adds important kinetic parameters to these results with substrates having not only aa either N or C terminal of the cleavage site but the entire eight aa acids of the extended specificity. One very interesting finding is that most natural substrates are not optimal substrates for thrombin, which indicates that cleavage of such sites in the absence of strong exosite interactions, may lead to a too efficient cleavage, and this may cause excessive coagulation and a risk of unwanted thrombosis. Exosite interactions may here facilitate the interaction between enzyme and substrate and increase the kinetics and also substrate specificity in cleavage. In addition, the use of the consensus site to screen the human proteome has resulted in the identification of a panel of 73 potentially novel substrates for thrombin, some of them may prove to be important targets for this multi-facetted enzyme.

Thrombin
Lyophilized powder of human plasma thrombin (SIGMA T-6884) was diluted in double-distilled water to a concentration of 0.2 NIH units/ml. One U or 0.2 U of diluted thrombin were used in two separate phage display analyses.

Analysis of thrombin's extended recognition sequence by substrate phage display
The cleavage specificity of thrombin was investigated with a T7 phage-displayed peptide library containing approximately 5610 7 individual nonamers as previously described [21,28,33,47]. In this library, randomized nonamers are inserted in the carboxy-terminus of T7 capsid protein 10A, followed by a six-histidine tag (His 6 -tag) for purification. The constant region at the amino-terminal flank of the peptides consists of the aa proline-glycine-glycine, breaking any secondary structures imposed by the capsid protein. In brief, phages are anchored to nickel nitrilotriacetic acid (Ni-NTA) beads via the His 6 -tags before the first protease treatment. The protease is added and allowed to react over night, releasing phages displaying cleavage-susceptible 9-mers from the beads. Samples are centrifuged and cleaved phages are collected in the supernatant. These phages are amplified in the E. coli strain BLT5615 and enter the next selection round (biopanning). After five biopannings, enriched cleavage-susceptible phages are sequenced.
For the analysis of thrombin, an aliquot of approximately 10 9 plaque-forming units was immobilized on 100 ml Ni-NTA agarose beads and incubated 1 hr with gentle rotation at 4uC. Unbound phages were removed by ten washes with 1.5 ml 1 M NaCl, 0.1% Tween-20 in PBS, pH7.2, and two subsequent washes with 1.5 ml PBS. The beads were resuspended in 1 ml PBS. One U or 0.2 U of thrombin were added and control samples with PBS instead of protease were run in parallel. Zero point two U/ml corresponds to a thrombin concentration of approximately 1.5 nM. Note the 137 mM sodium concentration in PBS implies that most of the thrombin is expected to be in the ''fast'' form [61]. Digestion and mock-digestion proceeded over night at room temperature under gentle rotation. Samples were centrifuged briefly on a tabletop centrifuge, pelleting the Ni-NTA beads. A control elution of the phages remaining bound to the Ni-NTA beads, using 100 ml 100 mM imidazole, concluded that at least 1610 8 phages were attached to the matrix in each selection round. Cleavagesusceptible phages were recovered, amplified and selected in five rounds as described earlier [21,28,33].
Fifty plaques were then arbitrarily isolated from Luria broth (LB) ampicillin (amp) plates representing biopannings with 1 U and 0.2 U of thrombin. Each plaque was dissolved in 100 ml phage extraction buffer (100 mM NaCl and 6 mM MgSO 4 in 20 mM Tris-HCl, pH 8.0) and shaken vigorously for 30 min. Phage DNA from the variable region in the capsid 10A gene was amplified by PCR with vector-specific primers. PCR fragments were purified with the Omega-BioTech's E.Z.N.A TM micro elute kit (Omega Biotech, Vancouver Island, Canada). Purified PCR fragments were sequenced on an ABI PRISMH 3700 DNA Analyzer. Nineteen and 18 individual inserts coding for cleavage-susceptible peptides were sequenced from plaques representing selection with 1 U or 0.2 U of thrombin, respectively. One sequence containing a stop mutation (from selection with 1 U thrombin) and one sequence file of bad quality (from selection with 0.2 U of thrombin) were discarded.

Generation of recombinant substrates for the analysis of the cleavage specificity
A new type of substrate was developed to verify the results obtained from the phage display analysis. Two copies of the E. coli thioredoxin gene were inserted in tandem into the pET21 vector for bacterial expression (Fig. 3A). In the C-terminal end, a His 6tag was inserted for purification on Ni-NTA agarose beads (Qiagen, GmbH, Hilden, Germany). In the linker region, between the two thioredoxin molecules, the different substrate sequences were inserted by ligating double stranded oligonucleotides into two unique restriction sites, one BamHI and one SalI site (Fig. 3A). The sequences of the individual clones were verified after cloning by sequencing of both DNA strains. The plasmids were then transformed into the E.coli Rosetta gami strain for protein expression (Novagen, Merck, Darmstadt, Germany). A 10 ml overnight culture of the bacteria harbouring the plasmid was diluted 10 times in LB Amp and grown at 37uC for 1-2 hr until the OD 600 reached 0.5. Isopropyl b-D-1-thiogalactopyranoside (IPTG) was then added to a final concentration of 1 mM. The culture was grown at 37uC for an additional 3 hr under vigorous shaking, after which the bacteria were pelleted by centrifugation at 3500 rpm for 12 min. The pellet was washed once with 25 ml PBS and 0.05% Tween 20. The pellet was then dissolved in 2 ml PBS and sonicated 6630 seconds to open the cells. The lysate was centrifuged at 13000 rpm for 10 min and the supernatant was transferred to a new tube. Five hundred ml of Ni-NTA slurry (50:50) (Qiagen, Hilden, Germany) was added and the sample was allowed to slowly rotate for 45 min at RT. The sample was transferred to a 2 ml column allowing the supernatant to slowly pass through the filter leaving the Ni-NTA beads with the bound protein in the column. The column was washed four times with 1 ml of washing buffer (PBS, 0.05% Tween, 10 mM Imidazole, 1 M NaCl). Elution of the protein was achieved by adding 150 ml elution buffer (PBS, 0.05% Tween 20, 100 mM Imidazole) followed by five 300 ml fractions of the elution buffer. Each fraction was collected individually. Ten ml from each of the eluted fractions was mixed with 1 volume of 26sample buffer and 1 ml bmercapto-ethanol and subsequently heated for 3 min at 80uC. The samples were analyzed on 4-12% pre cast SDS bis-tris PAGE gels (Invitrogen, Carlsbad, CA, USA) and the fractions that contained the most protein were pooled. The protein concentration of the combined fractions was determined using the Bio-Rad DC Protein assay (Bio-Rad Laboratories Hercules, CA USA). Approximately 60 mg of recombinant protein was added to each 120 ml cleavage reaction (in PBS). Twenty ml from this tube were removed before adding the enzyme, the 0 minute time point. The active enzyme was then added and the reaction was kept at room temperature during the entire experiment. Twenty ml samples were removed at the indicated time points (15 min, 30 min, 45 min, 60 min and 150 min) and the reactions were stopped with the addition of one volume of 26 sample buffer. One ml bmercapto-ethanol was then added to each sample followed by heating for 3 min at 80uC. Twenty ml from each of these samples was analyzed on 4-12% pre-cast SDS bis tris PAGE gels (Invitrogen, Carlsbad, CA, USA). The gels were stained over night in colloidal Coomassie staining solution and de-stained for several hours according to previously described procedures [62].
The intensity of the individual bands on the gel was determined from scanned high-resolution pictures by densitometric scanning of the gels and the program ImageJ (rsb.info.nih.gov/nih-image/). In order to obtain good estimates of the differences in activity towards different substrates different concentrations of the enzyme were used in several individual experiments. The combined results from these different gels were then used to get an accurate estimate of the difference in activity against the various substrates.

PROSITE scan for the Phage Display consensus motif
The Swiss-Prot, TrEMBL and PDB databases were searched by a PROSITE Pattern Scan (http://www.expasy.ch/tools/scanprosite/) for human (Homo sapiens) proteins holding the consensus motif derived from sequenced phage inserts, P-R-[AGST]-[not DE]-R. No description filter was chosen, at most one character was allowed to match a conserved position in the pattern, and the match mode was set to ''greedy, overlap, no includes''. Protein hits were assessed individually for the localization of the motif, expression pattern and presumable function. A number of hypothetical or poorly characterized proteins were not further assessed.