Adaptive Evolution and Divergence of SERPINB3: A Young Duplicate in Great Apes

A series of duplication events led to an expansion of clade B Serine Protease Inhibitors (SERPIN), currently displaying a large repertoire of functions in vertebrates. Accordingly, the recent duplicates SERPINB3 and B4 located in human 18q21.3 SERPIN cluster control the activity of different cysteine and serine proteases, respectively. Here, we aim to assess SERPINB3 and B4 coevolution with their target proteases in order to understand the evolutionary forces shaping the accelerated divergence of these duplicates. Phylogenetic analysis of primate sequences placed the duplication event in a Hominoidae ancestor (∼30 Mya) and the emergence of SERPINB3 in Homininae (∼9 Mya). We detected evidence of strong positive selection throughout SERPINB4/B3 primate tree and target proteases, cathepsin L2 (CTSL2) and G (CTSG) and chymase (CMA1). Specifically, in the Homininae clade a perfect match was observed between the adaptive evolution of SERPINB3 and cathepsin S (CTSS) and most of sites under positive selection were located at the inhibitor/protease interface. Altogether our results seem to favour a coevolution hypothesis for SERPINB3, CTSS and CTSL2 and for SERPINB4 and CTSG and CMA1. A scenario of an accelerated evolution driven by host-pathogen interactions is also possible since SERPINB3/B4 are potent inhibitors of exogenous proteases, released by infectious agents. Finally, similar patterns of expression and the sharing of many regulatory motifs suggest neofunctionalization as the best fitted model of the functional divergence of SERPINB3 and B4 duplicates.


Introduction
Proteolysis is involved in the regulation of numerous biological processes being fundamental in every cell and organisms. The activity of proteases is regulated by a complex network of inhibitory molecules and different human pathologies such as arthritis, cancer, neurodegenerative and cardiovascular diseases can be associated with the deleterious effects of uncontrolled proteolysis. Thus, the regulation of endogenous proteases is crucial in the maintenance of organisms' homeostasis and health status [1,2].
Serine protease inhibitors (SERPINs) are key elements in the regulation of proteolytic pathways, controlling the activity of serine proteases and helping to prevent from the pernicious effect of excessive proteolysis [1]. Some SERPINs can also inhibit cysteine proteases, acting as cross-class SERPINs, while others lost their inhibitory activity and developed other functions as serving as hormone carriers or chaperones [1,3,4]. SERPIN superfamily members share a conserved tertiary structure [5] with an exposed reactive center site loop (RCL), which carries the protease recognition site and acts as a pseudo-substrate determining protease specificity [6]. Inhibitory SERPINs regulate protease activity through a unique suicide mechanism where the RCL binds to the protease and is then cleaved between P1 and P19 (scissile bond) residues resulting in the formation of a covalent complex that irreversibly locks both SERPIN and protease [5,7].
Vertebrate SERPINs exhibit distinct exon-intron patterns [8] and segregate evolutionary into nine clades (A-I) [1]. The clade B SERPINs differ from other SERPINs by the absence of a signal peptide and by the occurrence of an additional polypeptide loop between helices C and D (CD-loop) present in most members [1]. Their localization in the cells is limited to cytoplasm and/or nuclear compartments where SERPINBs play a cytoprotective role through the inhibition of proteases involved in cell death [3,4]. However, several SERPINBs (SERPINB2, B3, B5 and B7) [6] can be released from cells under certain conditions, which in most cases is thought to result from passive cell loss or lysis [1,4]. Moreover, it has become apparent that these proteins participate alone or in concert with other molecules in the regulation of intricate proteolytic cascades implicated in tumor suppression, apoptosis, inflammation and angiogenesis, among others, through complex and still-obscure mechanisms [1,9,10].
In the SERPIN superfamily, events of gene duplication are likely to underlie the functional diversification of the inhibitory repertoire of these proteins [16]. Such phenomenon is well illustrated in vitro by mouse homologues Serpinb3a-d, while Serpinb3a inhibits both chymotrypsin-like serine proteases and papain-like cysteine proteases [17], Serpinb3b inhibits both papain-like cysteine proteases and trypsin-like serine proteases and no inhibitory activity was detected for Serpinb3c and Serpinb3d [16]. Likewise, the human homologs SERPINB3 and B4 (formerly known as squamous cell carcinoma antigen 1 (SCCA1) and 2 (SCCA2) respectively), share a sequence identity of 92% and regulate the activity of distinct proteases and in vitro experiments demonstrate that SERPINB3 targets cysteine proteases such as the cathepsins L1, L2, K and S (CTSL1, CTSL2, CTSK and CTSS) [18,19] whereas SERPINB4 is a potent inhibitor of the serine proteases cathepsin G (CTSG) and mast cell chymase (CMA1) and a poor inhibitor of CTSS when compared with SERPINB3 (50 times less efficient) [20].
In a healthy state SERPINB3 and B4 play a major role in cell protection against cytotoxic molecules mainly through the inhibition of CTSS that may leak into the cytoplasm as a result of lysosome failure [4,21,22]. Conversely, in cancer disease SERPINB3 was found to inhibit apoptosis, circumventing the mechanism of cell death and favouring tumour growth and metastization [23][24][25]. Indeed, the overexpression of SERPINB3 in some types of squamous cell carcinomas, namely uterine cervix carcinoma, esophagus carcinoma, head and neck carcinomas, breast carcinoma and hepatocellular carcinoma is correlated with a poor prognosis [9]. For this reason, SERPINB3 and B4 have been regarded as important serum biomarkers used for the diagnostic and prognostic of squamous cell carcinomas [26]. Moreover, SERPINB3 is also up-regulated in patients suffering from systemic sclerosis, psoriasis, bronchitis and pneumonia [4,27] and reduced in patients with hepatitis C infection and untraceable in patients with systemic lupus erythematosus [28].
Besides the role in cancer and autoimmunity, SERPINB3 and B4 have a dual role in the immune response to pathogens. Recent studies have shown that SERPINB3 may act as a surface receptor for the binding of hepatitis B virus to hepatocytes and to peripheral blood mononuclear cells [29][30][31]. In contrast, SERPINB3 and B4 can also target extrinsic proteases derived from several pathogens suggesting a protective role against the deleterious effects of several pathogenic organisms [32,33].
Interestingly, SERPINB3 and B4 were previously identified as an example of young gene duplicates under positive selection in the hominid lineage [34]. Duplication events are regarded as an important source of innovation underlying the onset of gene families from a single ancestral gene and contributing to the increase of complexity in the eukaryotic genomes [35]. Two alternative models are frequently used to explain the evolution and retention of duplicate genes in the genomes. The neofunctionalization model [36] that claims the gain of a novel function by a gene copy as the main reason for the retention of duplicates in the genome [37]. The subfunctionalization model [38] on the other hand, predicts lower selective constraints affecting equally both duplicates in a way that neither copy is sufficient to perform the original function, and both copies are maintained in the genomes [37].
Here, we combine phylogenetic based tests and protein structural analysis to assess the evolution of SERPINB3 and B4 and their target proteases in the view of understanding the selective forces shaping the divergence of SERPINB3 and B4 duplicates and its potential implications for human health and disease. Results suggest that SERPINB3 duplicate is evolving under positive selection supporting the functional divergence observed in several experimental studies.
The nonsynonymous/synonymous substitution rate ratio (d N / d S = v) was estimated using the maximum likelihood (ML) framework implemented in the program CODEML of Phylogenetic Analysis by Maximum Likelihood (PAML) software [43]. We used v values to investigate the selective pressures that have shaped the evolution of SERPINB3 and B4 duplicates and their known targets CTSS, CTSL1, CTSL2, CTSK, CTSG and CMA1. We used three likelihood ratio test (LTR) approaches to detect genes under positive selection: first the branch model evaluates the strength of natural selection in one or more phylogenetic clades and compares a single v value obtained for all lineages (M0) with a model assuming different v values for each lineage (free-ratio); second, the site models, which allows the v values to vary among sites of the protein and compares the neutrality models M1a and M7 against the positive selection models M2a and M8, respectively; third, the branch-site model was used to identify codons under positive selection within a phylogenetic clade that compares the null model, with a fixed v = 1 for all the sites in the background, with the alternative model, assuming a v.1 for all the sites in the foreground [44]. In all cases, the significance of the models was carried out using the likelihood ratio test -2Dl with a x 2 distribution [43,44]. The Bayes Empirical Bayes (BEB) approach is implemented to identify amino acids under positive selection [45].
For v calculation, sequences associated with species-specific stop codons were removed.

Protein modelling and docking
The three-dimensional (3D) structures of SERPINB3 (2ZV6), CTSS (2FQ9), CTSL1 (2XU3), CTSL2 (1FH0), CTSLK (3KWZ), CTSG (1CGH) and CMA1 (4AG1) proteins were obtained from Protein Data Base (PDB) (http://www.rcsb.org). In the case of SERPINB4, the 3D structure was predicted by homology modeling in MODELLER 9.10 software using SERPINB3 as template [46]. Structure validation was performed with PRO-CHECK [47] available in SWISS-MODEL web server [48]. After, to assess the possible functional significance of specific amino acids replacements between SERPINB3 and B4 in the target protease affinity, the obtained 3D structures were used to generate 3D structural models of inhibitor-protease complexes using the HADDOCK docking web server [49] (http://haddock. science.uu.nl). The published binding residue pairs, namely the P1 and P19 residues, from SERPINB3 and B4, and the amino acids that form the catalytic triad of target proteases, at the interface region of the inhibitor-protease complex, were used to drive the docking process. Visualization of the 3D structures was performed in PyMol 0.99rc6 [50]. The models were evaluated according to the HADDOCK score [51], interface root mean square deviation (iRMSD) and ligand root mean square deviation (lRMSD) [52].

Tissue expression screening of SERPINB3 and SERPINB4
A set of 21 human cDNA samples from different healthy organs was used to study the tissue pattern of SERPINB3 and B4 expression. Except for the first-strand cDNA from leukocytes (Clontech), the RNA from the First Choice Human Total RNA Survey Panel (Ambion) was used as a template to generate cDNA by RT-PCR using a Superscript III system (Life Technologies). PCR amplification was performed using the primers 59 -TGTAGGACTCCAGATAGCAC -39 and 59-TGTAG-GACTTTAGATACTGA -39, designed to be unique to the target SERPINB3 and B4 cDNA, respectively, and primer 59 -TGGAAATACCATACAAAGGCA -39. GAPDH was employed as control using primers 59 -TCAAGGCTGAGAACGGGAAG -39 and 59 -AGAGGGGGCAGAGATGATGA -39 for amplification (see Fig. S1).

Reconstructing the origin of SERPINB3 and SERPINB4 duplicates
The chromosomal regions of SERPINB3 and B4 from H. sapiens, P. troglodytes, G. gorilla, P. abelli, N. leucogenys, M. mulatta, P. anubis, C. jacchus and S. boliviensis were downloaded from the USCS and NCBI databases or obtained by direct sequencing. The SERPINB3 and B4 sequences were retrieved from the human reference sequence of the chromosome 18 (assembly GRCh37,) in a large genomic segment delimited by SERPINB7 and SERPINB12 (chr18: c61429197-61222431) and aligned with the homologous sequences from non-human primates (see Table S1). Overall, sequence alignments revealed a conserved pattern of seven coding exons in primates for SERPINB3 and B4 (Fig. S2). However in M. mulatta, P. anubis, C. jacchus and S. boliviensis one of the duplicates was absent (Fig. 1A). In addition, the analysis of the predicted cDNA and protein sequences revealed that P. abelli and N. leucogenys telomeric duplicates have a premature stop codon in positions 60 and 19, respectively, causing any resulting protein to be abnormally shortened and suggesting that these duplicates are in fact pseudogenes.
The phylogenetic tree constructed using functional SERPINB3 and B4 sequences, places the duplication event before the divergence of H. sapiens, P. troglodytes and G. gorilla (Fig. 1B). However, the finding of non-functional gene copies in P. abelli and N. leucogenys species suggests that a duplication event occurred in a common ancestor of Hominoidae (great apes), after the separation from the Old World monkeys 29.6 million years (MY) ago. Interestingly, the protein alignments obtained for the RCL region in the different primate species suggest the existence of an ancestral SERPINB3/B4 (AncB3/4) with two possible scissile bond (P1-P19) compositions either TS or LS (Fig. 1B). The presence of a SS scissile bond, suggests that the telomeric gene, named SERPINB3 in humans, arose recently in evolution (about 9 MY ago in Hominidae) as the result of duplication and functional divergence. Noteworthy, SERPINB3 accumulated several other differences in the RCL region which are likely to have contributed to a shift in its protease affinity.

Adaptive evolution of SERPINB3
We performed a maximum likelihood (ML) analysis, using codeml package in PAML software, to test whether the functional divergence of SERPINB3 is a result of positive selection [43,44]. Initially, we estimated the v ratio for the entire phylogeny (M0 model) and the independent v ratio for each branch to assess and characterize the selective pressures acting on SERPINB3/B4 evolution. Overall, the M0 model shows a low value of v for the entire phylogeny (v<0.67) suggesting a conserved evolution (v,1). Also, the comparison of M0 versus the free-ratio (22DlnL = 16.18, p.0.05) suggest that the different lineages experienced similar evolutionary rates. However, this result is not unexpected, since averaging across all sites is not a powerful test of adaptive evolution. Hence, we used likelihood ratio tests to compare nested models with and without positive selection to look for evidence of site-specific positive selection in SERPINB3/B4 phylogeny. The comparisons of M1a (nearly neutral) versus M2a (positive selection) and M7 (beta) versus M8 (beta and v.1) show significant (p,0.001) evidence of positive selection for SERPINB3 and B4 genes (Table 1). For M2a and M8 models, the BEB analysis identified the same 17 sites under adaptive evolution (v. 1) with high posterior probability (p.90%) ( Table 1).
To test if this signal of positive selection could be connected with the appearance of SERPINB3 we used the branch-site model test. This test allows the v ratio to vary among sites in the protein and across branches in the tree to detect if positive selection was affecting sites along specific lineages. In the SERPINB3/B4 tree the likelihood ratio tests, based on the branch-site models, were significant (p,0.01) only for the foreground branch 1 (Fig. S3), which includes the lineages from H. sapiens, P. troglodytes and G. gorilla for the SERPINB3 duplicate (Table 2). Although most sites are under constrained evolution, the residues 327G, 351G and 352F were identified by the BEB analysis as being under positive selection (p.80%) in the SERPINB3 clade (foreground branch 1).
Finally, to evaluate the structural basis of the positive selection signatures detected by the ML analyses, we compared SERPINB3 and B4 3D structures. However, since the SERPINB4 3D structure was not available in the surveyed databases, we used MODELLER software to calculate a homology model of SERPINB4 using the crystal structure coordinates of SERPINB3 as template (Fig. 2). Structural superimposition of the modelled SERPINB4 structure with the SERPINB3 template showed a very low root mean square deviation (RMSD) of 0.22 Å , which reveals a quite similar protein backbone.
From the 17 sites under positive selection identified by the sitemodel analysis, seven correspond to differences in the RCL from SERPINB3 and B4 mainly V351G, V352F, E353G, L354S, S356P, P357T and C364H (Fig. 2). As mentioned above, the RCL is a crucial region for the interaction with the target proteases being responsible for the functional SERPIN specificity, in which these 7 residues are likely to have a significant effect. Also, residue C279R is located at b-sheet C, in the gate domain (Fig. 2), a important region for the full insertion of RCL after protease cleavage [53]. Thus, amino acid alterations in this region could affect the RCL insertion and the SERPIN inhibitory mechanism. Finally, from the remaining eight sites under positive selection, six residues cluster together at the distal end of RCL (Fig. 2). Once inserted inside the molecule the RCL presses the target protease against the bottom of the SERPIN resulting in the distortion of the protease active site, greatly reducing the enzyme catalytic activity [5]. Consequently, amino acids positioned at the distal end of the RCL are in close proximity to the inhibited protease and substitutions in these sites are probably implicated in the stability of the inhibitor-protease complex.
Furthermore, branch-site model analysis identified the amino acid K327G and the RCL V351G and V352F residues as being under positive selection in SERPINB3 duplicate for H. sapiens, P. troglodytes and G. gorilla lineage. In the case of SERPINB3, amino acids 351G and 352F are located in the RCL, very close to the 354S/355S scissile bond, and may have a relevant functional role in the specificity of SERPINB3 towards cysteine target proteases and in its functional divergence from SERPINB4. Amino acid 327G is located in the highly conserved b-sheet A in the shutter domain (Fig. 2) that has a key role in SERPIN suicide mechanism. Once cleaved by a protease the exposed RCL undergoes drastic conformational alterations ending inside of the SERPIN, inserted into the b-sheet A region. As a result, many of the RCL become buried with a major impact in the rate of RCL insertion [5]. Since the RCL of SERPINB3 and B4 differ in their amino acid compositions, the substitution of a polar residue, lysine (SERPINB4) by a stereochemically different glycine (SERPINB3) could be of crucial importance for an efficient insertion of SERPINB3 RCL.

Target protease evolution
Furthermore, maximum likelihood approaches were used to address the evolutionary signatures of SERPINB3 and B4 target proteases and to check for similar evolutionary paths that could point to a possible coevolution process between inhibitor and target proteases mainly CTSS, CTSL1, CTSL2, CTSK, CTSG and CMA1. As for SERPINB3/B4 phylogeny, the one ratio (M0) model tests reveal a v,1 suggesting an overall conserved evolution for the CTSS, CTSL1, CTSL2, CTSK and CMA1 phylogenies. However, CTSG shows higher v ratios (v<0.98), which suggests a relaxation in the selective constrains. Also, the comparison of M0 versus the free-ratio model indicates that the different lineages experienced similar evolutionary rates, except for CTSS gene (Table 3) in which selective pressures may differ across CTSS tree branches. We then proceeded to more powerful and  robust approaches to test for evidence of site-specific positive selection across the entire phylogeny or within a specific phylogenetic clade for CTSS, CTSL1, CTSL2, CTSK, CTSG and CMA1. The comparisons of M1a (nearly neutral) versus M2a (positive selection) and M7 (beta) versus M8 (beta and v.1) show that CTSL2, CTSG and CMA1 genes are under positive selection (Table 3) and several codons were identified as subject to positive selection. Interestingly, in a previous work both CTSG and CMA1 were shown to be under positive selection in mammalians, possibly as a result of a trade-off between increased response to pathogens and decreased risk of autoimmunity by apoptosis related genes [54]. Furthermore, branch-site models were used to detect if positive selection was affecting sites along specific clades in CTSS, CTSL1, CTSL2, CTSK, CTSG and CMA1 phylogeny and establish whether selective pressures varied in a similar way as for SERPINB3/B4 gene tree suggesting inhibitor/target coevolution. Interestingly, we found evidence of positive selection (p, 0.05) for CTSS gene (Table 4), when comparing the foreground H. sapiens, P. troglodytes and G. gorilla clade with the background phylogeny (Fig. S4) and we detected residue 255R as being under positive selection (p.90%). Therefore, positive selection might be acting in SERPINB3 duplicate and CTSS for H. sapiens, P. troglodytes and G. gorilla lineage which can point to a possible coevolution between inhibitor and target protease. No statistical significance was obtained for the H. sapiens, P. troglodytes and G. gorilla clade (foreground) in the remaining branch-site tests (CTSL1, CTSL2, CTSK, CTSG and CMA1). Finally, to evaluate the functional impact of the sites identified as being under positive selection in SERPINB3/B4 and target proteases, we built 3D structures of human SERPINB3-and B4target complexes. The HADDOCK outcomes for the best models (Table 5) are consistent with the known inhibitory activity for SERPINB3 and B4 published in previous studies [18,19]. Except for SERPINB4/CTSS complex, HADDOCK generated good predictions with i-RMSD#2 Å and l-RMSD#5 Å [52]. Interestingly, the bad quality prediction for SERPINB4/CTSS complex (i-RMSD$4 Å and l-RMSD$10 Å ) is consistent with previous in vitro results that show the low inhibitory activity of SERPINB4 towards CTSS, 50 times less than SERPINB3 [20]. Figure 3 shows the 3D structures of SERPINB3/CTSS and SERPINB4/CTSG complexes as representatives of inhibitorproteases complexes. The seven RCL residues identified by the site-model tests as under positive selection for SERPINB3/B4 phylogeny (Table 1) (V351G, V352F, E353G, L354S, S356P, P357T and C364H), are in the inhibitor/protease interface, in close proximity to the activity site of the target protease (Fig. 3). Overall, the RCL plays a critical role in the inhibitory activity of SERPINs and some studies highlight this notion by showing that the target specificities of SERPINB3 and B4 could be reversed solely by swapping their RCL [18]. Moreover, as experimentally reported, single amino acid substitutions in the RCL region were unable to convert SERPINB4 in a more efficient cysteine protease inhibitor. In the particular case of CTSS inhibition, different combinations of mutations at SERPINB4 positions P2, P29, P39 and P109 led to an increase in CTSS inhibition accounting for 80% of the difference in SERPINB3 and B4 activity [55]. Interestingly, the P2, P29, P39 and P109 positions correspond to the residues E353G, S356P, P357T and C364H, respectively, which were found to be under strong positive selection in the present study. Furthermore, the residue V352F, in position P3, is a key residue for specificity and binding of papain-like cysteine proteases and in the case of CTSS the preferred P3 residues are bulky hydrophobic, as phenylalanine residue in SERPINB3 [18]. In addition, P1 position (L354S) was found to be under positive Table 3. Phylogenetic tests of positive selection for target proteases. selection and several mutagenesis studies show that the P1 residue is usually the most important for SERPIN protease specificity [5]. The 3D structures of SERPINB3/CTSS (Fig. 3), SERPINB4/ CMA1 and SERPINB4/CTSG (Fig. 3) reveal that several residues under positive selection (Table 3 and Table 4) are located in the loops surrounding the enzyme catalytic pocket, which have been shown to be involved in substrate specificity and in enzyme activation [54]. Also, the location of these residues in loops near to the enzyme catalytic pocket may suggest a possible role in the 3D conformation assumed by this region. Moreover, X-ray analysis of the SERPIN-protease inhibition complexes reveals that the distortion of protease activity is due to the compression of the loops surrounding the protease active site against the basis of the SERPIN. Hence, an amino acid substitution in the protease loops neighbouring the active site could have physical implications in the inhibition mechanism [5] and contribute for the functional divergence of SERPINB3 and B4.

Tissue expression pattern of SERPINB3 and SERPINB4
A panel of 21 tissues was used to determine the expression pattern of SERPINB3 and B4. As shown in figure 4, SERPINB3 and B4 transcripts were found in uterus, esophagus, lung, prostate, testis and trachea tissues, whereas in bladder and thymus only the expression of SERPINB3 was detected (Fig. 4). These expression patterns are consistent with the ones obtained by Cataltepe and colleagues, who have shown that SERPINB3 and B4 are frequently co-expressed in several adult human tissues at both mRNA and the protein levels [27]. In addition, these findings fit the expectations of two recent duplicates being more likely to share cis-regulatory motifs and to display stronger co-expression patterns than two randomly selected genes [56]. The ENCODE annotation of transcript factors by CHIP-seq for SERPINB3 and B4 available in UCSC database (http://genome.ucsc.edu/) confirms that these duplicates still share several regulatory motifs, including STAT3, CEBPB, FOS and JUN (Fig. S5), which are associated to immunity and apoptosis pathways. Furthermore, upstream of SERPINB3 there is an active regulatory region, identified by an H3K27Ac histone mark, and multiple transcripts factors which possibly affect both duplicates (Fig. S5). Therefore, the similar expression pattern of SERPINB3 and B4 is best explained by the low divergence in the cis-regulatory motifs contrasting with functional specialization into cysteine and serine inhibitors, respectively.
Finally the expression sequence tag (EST) profile of CTSS, CTSL1, CTSL2, CTSK, CTSG and CMA1 target proteases was assessed revealing an overlap with SERPINB3 and B4 expression pattern in several tissues (Fig. S6).

Discussion
In the present work, we evaluate the evolutionary forces forging the recent duplicates SERPINB3 and B4 and address their functional impact in protein structure, inhibitor-protease interaction and gene expression regulation. Phylogenetic analysis reveals that a duplication event, at approximately 29.6 MY ago, gave rise to SERPINB3 and B4 paralogs, stably retained in H. sapiens, P. troglodytes and G. gorilla genomes, but not in P. abelli and N. leucogenys species, which carry a pseudogene and an ancestral Table 4. Likelihood ratio test for branch-site model for target proteases using H. sapiens, P. troglodytes and G. gorilla lineage as foreground.

Gene
Parameter estimates Foreground vs. Background -2DlnL Positively selected sites  In this context we can consider two scenarios, either the duplication led to the acquisition of a complete new function by one of the duplicates or a subdivision of the ancestral function occurred to accommodate an improved inhibitory activity. Under a subfunctionalization hypothesis, after the duplication event both copies would maintain the original function and several degener-  ative mutations would be tolerated by SERPINB3 and SER-PINB4, due to a relaxation of selective constrains. However, this model fails to explain the different hits of positive selection detected for the entire SERPINB3/B4 phylogeny and for the SERPINB3 clade alone. Likewise, the subfunctionalization theory predicts an expression diversification where duplicates sharing the same function become specialized in different tissues or developmental stages [38], which is not the case of SERPINB3 and B4. Instead, the neofunctionalization model seems to fit better the evolutionary history of SERPINB3 and B4 duplicates. According to this model a copy is kept under purifying selection and retains the original function while the other is targeted by positive selection and experiences the accumulation of several amino acid substitutions ultimately leading to a novel function.
Several studies have demonstrated that positive selection frequently occurs in concert with duplication events in genes involved in brain function and cell growth [57,58], reproduction [59], endurance running [60] and in xenobiotic recognition of macromolecules [61]. In addition, several gene families implicated in the immune system were proposed as targets of positive selection [62,63]. There, gene duplications are considered a important mechanism in the enlargement of host defence repertoire, which is crucial for a rapid response to changing environments and to a increased burden of pathogens [64]. For instance, the tripartite motif (TRIM) protein family, a group of innate antiviral effectors, experienced several episodes of strong positive selection showing high levels of sequence divergence between paralogs and a wide range of antiviral activities possibly resulting from different attempts to counteract fast evolving viruses [65].
Similarly, evidence for positive selection was detected in several members of the SERPIN superfamily. SERPINB11, a highly conserved gene in primates, was lost and resurrected in humans where the accumulation of several mutations contributed to the appearance of a modified non-inhibitory SERPIN, probably linked to an adaptive response against the emergence of infectious diseases in recent human evolution [66]. Also, in SERPINA2, a 90 MY old duplicate of alpha1-antitrypsin (SERPINA1), several sites seem to be under positive selection in primates, contributing to the emergence of a new advantageous function, possible as a chymotrypsin-like inhibitor [67]. Conversely, a large deletion in SERPINA2 was proposed to be selective advantageous in Africans through a potential role in fertility or in host-pathogen interactions (Seixas, et al 2007).
Such recent studies are in agreement with earlier assumptions based mostly in human and rodent sequences that established a link between RCL hypervariability, SERPIN superfamily functional diversity and positive selection acting after gene duplication [68][69][70]. Furthermore, Hill and Hastie postulate that these adaptive changes were fixated because SERPINs were challenged by exogenous proteases brought in by infectious agents, which may indicate an ongoing host-pathogen coevolution [69].
Likewise, we propose that the SERPINB3/B4 selective signatures are the result of a coevolution process involving either endogenous or exogenous target proteases. Indeed, the structural and docking analyses are in line with previous biochemical studies [19,55], showing that many of the putatively selected sites fall in regions important for the inhibitor function promoting functional divergence between SERPINB3 and B4. Also, the ability of SERPINB4 to inhibit CTSS, as well as other papain-like cysteine proteases, at a rate 50-fold slower than that of SERPINB3 [55] may suggest that the functional divergence of these two inhibitors is still ongoing. Finally, the scenario of functional divergence is strengthened by the consistence of selective signatures of SERPINB4 targets, CMA1 and CTSG in the primates (our study) and mammalian phylogenies [54,71]. Since CMA1 and CTSG are powerful proteases involved in programmed cell death (apoptosis) and in the immune response, an evolution of these molecules driving by host-defence is also likely. Hence, selective hallmarks observed throughout SERPINB3/B4 phylogeny can result from an adaptive response to CMA1 and CTSG evolution.
The overlap of CTSS and SERPINB3 selective signatures in the H. sapiens, P. troglodytes and G. gorilla clade points as well for a possible coevolution of these molecules. Interestingly, both CTSS and SERPINB3 are found in endosome/lysosome structures in macrophage [72] and B cells [28] where CTSS is thought to be engaged in antigen presentation through the degradation of a major histocompatibility complex class II chain [73].
Aside from a role in innate immunity through the regulation of endogenous proteases, SERPINB3 may also be enrolled in the host-pathogen response by the inhibition of cysteine proteases released in the infectious processes by Staphylococcus aureus (staphopains) [33], Leishmania Mexicana (CPB2.8), Trypanosoma cruzi (cruzain), T. brusei rhodesience (rhodesain) and Fasciola hepatica (cathepsin L2) [32]. Worth to note, SERPINB3 is expressed in squamous epithelium of mucous membranes, skin and the respiratory system, where it may act as a primary hostdefence mechanism by preventing pathogens to cross and disrupt epithelial barriers. Moreover, the regulation of SERPINB3 expression by the transcription factors STAT3, CEBPB and FOS/JUN AP-1 complex, which are involved in the development and modulation of the immune system, regulation of cell proliferation and differentiation, mediation of cytokine receptors signaling and control of genes involved in the immune and inflammatory responses [74][75][76], further supports the possible role of SERPINB3 in immune response.
In conclusion, the present work shows a positive selection signature throughout SERPINB3/B4 phylogeny, which may be a major force driving the functional divergence of SERPINB3 and B4 duplicates. Ultimately, adaptive evolution led to different protease specificities providing SERPINB3 and B4 with the ability to inhibit a broader repertoire of endogenous and exogenous proteases. Furthermore, the retention of SERPINB3 and B4 duplicates in the H. sapiens, P. troglodytes and G. gorilla clade could have a selective advantage in host-pathogen interactions due to an adaptive response against infectious diseases in Africa, during the evolution of great apes. Also, our results show that SERPINB3 duplicate is being subject to strong positive selection that could derive as well from ongoing host-pathogen coevolution. The interaction of host protease inhibitors with invasive proteases of pathogens can constitute a strong evolutionary pressure for the host to counteract by evolving new and effective inhibitors. Above all, the search for a positive selection signal among inhibitors and target proteases could contribute for a better understanding of the complex interactions involving both types of molecules and how its imbalance could lead to the onset of different types of carcinomas and immune diseases, having potential therapeutical implications. Table S1 Genomic locations for the DNA sequences retrieved from the National Center for Biotechnology Information database (NCBI) and University of California Santa Cruz (USCS) Genomic Bioinformatics database for the nine primate species.