Evolutionary, Structural and Functional Interplay of the IκB Family Members

A primary level of control for nuclear factor kappa B (NF-κB) is effected through its interactions with the inhibitor protein, inhibitor of kappa B (IκB). Several lines of evidence confirm the existence of multiple forms of IκB that appear to regulate NF-κB by distinct mechanisms. Therefore, we performed a comprehensive bioinformatics analysis to understand the evolutionary history and intrinsic functional diversity of IκB family members. Phylogenetic relationships were constructed to trace the evolution of the IκB family genes. Our phylogenetic analysis revealed 10 IκB subfamily members that clustered into 5 major clades. Since the ankyrin (ANK) domain appears to be more ancient than the Rel homology domain (RHD), our phylogenetic analysis suggests that some undefined ancestral set of ANK repeats acquired an RHD before any duplication and was later duplicated and then diverged into the different IκB subfamilies. Functional analysis identified several functionally divergent sites in the ANK repeat domains (ARDs) and revealed that this region has undergone strong purifying selection, suggesting its functional importance in IκB genes. Structural analysis showed that the major variations in the number of ANK repeats and high conformational changes in the finger loop ARD region contribute to the differing binding partner specificities, thereby leading to distinct IκB functions. In summary, our study has provided useful information about the phylogeny and structural and functional divergence of the IκB family. Additionally, we identified a number of amino acid sites that contribute to the predicted functional divergence of these proteins.


Introduction
Nuclear factor kappa B (NF-kB) proteins comprise a family of structurally related and evolutionarily conserved transcription factors that are involved in the control of a large number of normal cellular and organismal processes such as immune and inflammatory responses, developmental processes, cellular growth and apoptosis. The 5 members of the mammalian NF-kB transcription factor family are p65 (RelA), RelB, c-Rel, NF-kB1 (p50 and its precursor p105), and NF-kB2 (p52 and its precursor p100), which associate with each other to form various transcriptionally active homo-and heterodimeric complexes [1,2]. Studies carried out by Baltimore et al. led to the discovery that NF-kB is regulated through its interaction with inhibitor of kappa B (IkB) proteins [3]. The IkB protein family consists of 11 members: Relish, NF-kB1 (p105), NF-kB2 (p100), Cactus, IkBa, IkBb, IkBc, IkBe, IkBf, IkBNS (also known as IkBd), and Bcl3. The defining feature of IkB proteins is the presence of multiple (generally [5][6][7][8] copies of the ankyrin (ANK) repeats, a 33-residue helix-turn-helix protein motif that is found in many proteins ranging from bacteria to humans [4,5]. The primary function of IkBs was originally thought to be the suppression of NF-kB activity; however, recent studies have revealed that IkBs are not simple inhibitors of NF-kB activity, but rather pleiotropic NF-kB cofactors and more complex regulators of gene expression [6,7].
Similar to human p100 and p105, Relish is a Drosophila compound protein, comprising an N-terminal Rel homology domain (RHD) and a C-terminal IkB-like region. The NF-kB precursor proteins p100 and p105 contain ANK repeats at their C-terminal region and prior to processing are capable of inhibiting NF-kB activity [8,9]. In addition, the C-terminal portion of the p105 protein, IkBc has been reported to inhibit the c-Rel, p50/65 and p50 homodimers [10]. Cactus is a Drosophila homolog of mammalian IkB proteins which forms cytoplasmic complexes with NF-kB [11]. In most resting cells, NF-kB dimers associate with one of the typical IkBs such as IkBa, IkBb and IkBe. These IkBs mask the nuclear localization signal (NLS) of NF-kB, thereby preventing its translocation into the nucleus. The activation of cells with appropriate stimuli, particularly Toll-like receptor ligands or various host immune mediators such as proinflammatory cytokines, including tumor necrosis factor a and interleukin (IL)-1 superfamily proteins [12,13,14,15], lead to the phosphorylation of cytosolic IkBa and its rapid ubiquitin-proteasomal degradation, resulting in the release of NF-kB dimers. These liberated dimers then translocate into the nucleus and bind to kB sites in the promoter/enhancer regions of target genes, resulting in their transcriptional regulation via the recruitment of co-activators and co-repressors. This leads to the expression of primary/early response genes, including 3 atypical/nuclear IkB family members, IkBf, Bcl3 and IkBNS, which play vital roles in regulating the transcription of secondary response genes by acting as either activators or inhibitors in the nucleus [16]. To date, 11 IkB proteins have been identified.
Generally, individual IkBs associate preferentially with a particular set of NF-kB dimers, thereby modulating the transcriptional response. Cytoplasmic IkBs bind preferentially to NF-kB dimers that possess at least one p65/RelA or c-Rel subunit, thereby masking their NLS and causing them to be retained in the cytoplasm [17]. One of these proteins, IkBb acts as both a positive and negative regulator [18], and thus has a function similar to that of nuclear IkBs. Among the nuclear proteins, Bcl3 functions as either an activator or an inhibitor of NF-kB in a context-specific manner by regulating its transcriptional activity [19,20]. IkBNS inhibits the production of IL-6 by associating with NF-kB p50 homodimers, thereby preventing them from binding to the IL-6 promoter [21,22]. Another member of this group, IkBf, associates with other nuclear proteins such as STAT3, Brg1 and CEBP1, in addition to NF-kB proteins [2,23,24,25,26,27].
A previous study classified IkB proteins into 3 subfamilies and clustered Relish with the IkB proteins, identifying it as an early offshoot of the IkB family [28]. However, because they used only limited sequences and species to construct their phylogeny, it did not provide a full detailed framework for the whole IkB family. The evolutionary history of the IkB family needs to be confirmed by systematic phylogenetic analysis. Furthermore, whether natural selection has driven the evolution of the IkB proteins remains unknown. In this study, we investigated the phylogenetic and molecular evolution of the IkB family more thoroughly. This has allowed us to predict the structural and putative functional motifs and a number of critical amino acid sites that may be significant in the functional divergence of the IkB members. Additionally, principal component analysis (PCA) was employed to characterize the interconformer relationships between experimentally determined ANK repeat domain (ARD) structures. Comparative modeling was also utilized to build 3D structural models of the IkBs, and these structures were subjected to normal mode analysis (NMA) to identify the motion of the domains. These evolutionary, structural and functional analyses of IkB proteins will enable us to understand better the conservation and classification of IkB proteins, as well as providing insights into the evolution of functional divergence between the members of the IkB family.

Sequence collection
In order to identify the IkB homologs, we used 10 biochemically characterized IkB protein sequences from Homo sapiens and Drosophila melanogaster sources (IkBa_Hs-CAB65556, IkBb_Hs-AAH15528, IkBe_Hs-NP_004547, Bcl3_Hs-NP_005169, IkBNS_Hs-Q8NI38, IkBf_Hs-Q9BYH8, NF-kB1_Hs-AAA36361, NF-kB2_Hs-CAC08399, Cactus_Dm-AAA85908, and Relish_Dm-AAB17264) as query sequences. The full protein sequences of the vertebrate and invertebrate species were selected from the NCBI database. Sequences that were not included in NCBI were obtained through PSI-BLAST, ENSEMBL, Uniprot, Interpro and DOE Joint Genome Institute databases. Sequence searches were performed using BLASTP [29] against a nr database [30] with default parameters. Each IkB protein search produced approximately 500 hits. We identified the biochemically characterized proteins manually from the hits and used them as queries for repeated BLAST searches. From the total BLAST output, we selected those IkB homologs with amino acid identity .35% over a stretch of ''.X ,Y''. The values of ''X'' and ''Y'' varied depending upon the specific IkB protein. The lengths of most of the IkBs are quite dissimilar. For instance, IkBa is 317 residues in length, whereas IkBf is 718 residues long.
Due to these discrepancies in length, we utilized the following specific cut-off values for each IkB protein: IkBa, 250-350; IkBb, 350-450; Bcl3, 350-550; IkBe, 400-600; IkBNS, 250-400; IkBf, 600-900; NF-kB1, 850-1100; NF-kB2, 875-1100; Relish, 850-1100; and Cactus, 400-600 residues. We eliminated all hypothetical, putative, predicted, redundant and unnamed proteins from our dataset. This cut-off analysis identified 545 sequences from a list of sequences for 172 unique species; the sequences were subjected to SMART, Pfam [31] and InterPro [32] analyses to identify the domain architecture. On the basis of this analysis, we refined the search results manually to further reduce hits with partially conserved functional domains and other false positives and finally selected the remaining 386 sequences from 124 unique species for further analysis. Pairwise sequence comparisons of all of the IkB proteins were carried out using the Needleman-Wunsch alignment program from the EMBOSS package [33,34].

Sequence alignments and phylogenetic analysis
MSA is the second critical step in phylogenetic analysis and the alignment quality may have an enormous impact on the final phylogenetic tree. All IkB sequences (vertebrate, invertebrate and whole IkBs) were imported into the Geneious Pro software v5.5.7 (Available from http://www.geneious.com/), where the initial alignment was performed using the plugin MAFFT v6.814b [35] used in Geneious software with default parameters (auto algorithm; scoring matrix = BLOSUM62; gap open penalty = 1.53; and offset value = 0.123). The MSAs were manually inspected using Jalview [36]. A few particularly gap-rich positions, poorly aligned and divergent regions from the alignments were excluded from the phylogenetic analysis. The final dataset contained 340 sequences from 111 organisms (Table S1). The MSAs were used to construct 3 phylogenetic trees by using 2 different and independent approaches: the NJ and Bayesian methods. The NJ and Bayesian phylogenetic tree reconstructions were performed using Geneious Pro v5.5.7. The molecular distances between the aligned sequences were calculated using the Jukes-Cantor genetic distance model. All gaps and missing data in the alignments were accounted for by pairwise deletions. Branch points were tested for significance by bootstrapping with 1000 replications. Bayesian analysis was conducted using the MrBayes v3.1.2 [37] plug-in implemented in Geneious. The MrBayes parameters were as follows: prset aamodelpr = mixed; lset rates = gamma; Ngammacat = 4; mcmcngen = 1100000; samplefre = 200; nchains = 4; and starting tree = random. Branch points were tested for significance by bootstrapping with 1000 replicates.

Homology modeling
The modeling procedures used in the current study have been described previously [2,38,39,40]. A homology model of IkBe was built using the top 2 ranked templates (Bcl3 (1K1A) and IkBb (1K3Z)), whose sequence identities were 42% and 43.89%, respectively. In the case of IkBNS and IkBf, models were built using Bcl3 (1K1A), which shared the highest sequence identity with the 2 target sequences (36.4% with IkBNS and 37.28% with IkBf), as a single template. IkBa (1IKN) served as a suitable template for Cactus. The sequence identity of Cactus with IkBa was 33%. Another ARD, 1N11, served as a suitable template for Relish, whose sequence identity was 20%. Bcl3 (1K1A) acted as a template for both NF-kB1 and NF-kB2, whose sequence identities were 40% and 39%, respectively. The target-template sequence alignments were performed using MUSCLE [41]. All of the models were built using Modeller 9v8 [42]. We constructed 3D models using a distance restraint algorithm based on the MSA of the target sequence with the template structures by applying the CHARMM force field [43]. An optimization method, which involved conjugate gradients and MD-simulated annealing, was employed to minimize violations of spatial restraints. For model building, the default parameters included in the ''automodel'' class were used. Subsequently the 20 models were subjected to automatic loop refinement, from which the best final model was selected based on stereochemical and energetic evaluations. The model was further evaluated with DOPE (Discrete Optimized Protein Energy), a pairwise atomic distance statistical potential that assesses atomic distances in a model relative to those observed in many known protein structures [44]. DOPE is based on an improved reference state which corresponds to the non-interacting atoms in a homogeneous sphere with the radius dependent on a sample native structure. It accounts for the finite and spherical shape of the native structures. The stereochemical quality of these proteins was assessed using ProQ [45] and MetaMQAP [46]. The final models were subjected to energy minimization using the AMBER 03 force field implemented in YASARA [47].

Functional divergence
Type I and II functional divergence among the IkB subfamilies was examined using the DIVERGE 2.0 software [48,49], which implements the tests suggested by Gu [37] that can be used to determine whether the coefficients of divergence (h I and h II ) are significantly greater than 0. The IkB MSA was used as an input in DIVERGE 2.0, and a rooted NJ tree was generated using the Poisson correction distance measure. This tool utilizes the phylogenetic tree to assess site-specific changes in evolutionary rates within amino acid alignments when comparing subclades. Using type I functional divergence, the coefficient of evolutionary functional divergence (h) can be measured to assess the changes in site-specific evolutionary rates, i.e., h = 0 indicates no functional divergence, while increasing values indicate increasing functional divergence, with h = 1 being the maximum. Utilizing h, we tested for significant functional divergence (LRT, p,0.05) for each of the pairwise comparisons of the 10 different IkB subfamilies, which were subdivided into taxonomic groups (1 or 2 representative sequence from each species class), where appropriate, based on the phylogenetic tree. For instance, the IkBa sequences were divided into mammalian IkBa (Homo sapiens and Mus musculus), avian IkBa (Gallus gallus), reptilian IkBa (Anolis carolinensis), amphibian IkBa (Xenopus tropicalis), and actinopterygian IkBa (Latimeria chalumnae) sequences. Using type II functional divergence, residues with radical biochemical changes between the subfamily groups can be identified. We focused on radical changes among the vertebrate IkB subfamilies and applied a cut-off value of h.17 for sitespecific posterior probabilities. Type II functional divergence was analyzed for the following 8 IkB subfamilies: IkBa, IkBb, IkBe, IkBNS, IkBf, Bcl3, NF-kB1 and NF-kB2.

Analysis of selective pressure
The detection of selective forces acting at a single amino acid site in a given protein sequence may be of importance in understanding the processes underlying the evolution of that particular protein. We carried out site-wise Ka/Ks analysis of IkB gene coding sequences using the Selecton server version 2.4 [50,51], an online program to detect the selection forces acting at each amino acid site in the IkB sequences. The Selecton server uses a non-aligned file containing homologous coding DNA sequences or a codon-aligned file of coding DNA sequences in FASTA format as the input. We submitted the aligned IkB gene coding sequences individually for each IkB subfamily member to Selecton as the input. Among these coding sequences, IkB representative sequences were used as the query. Subsequently, we used the Selecton server to analyze the entire IkB coding sequences and identify the residues involved in strong purifying selection.

PCA of ARD crystal structures
Using the 3D atomic coordinates of Bcl3 as the query, we identified homologous structures in the PDB by using minimum cut-off values of 20% sequence identity and 40% coverage. In this analysis, 71 solved structures were identified. PCA was employed to examine interconformer relationships; this method has been described in detail elsewhere [52]. This analysis allowed us to view the distribution of the dataset structures in the subspace spanned by PC1 and PC2, and therefore to discriminate or cluster the conformations based on their most distinctive structural similarities or dissimilarities. The analysis was performed with software from the ProDy package [52,53].

Normal mode calculations
The anisotropic network model (ANM) is a coarse-grained model for the protein dynamics, which is used to investigate the vibrational motions [54,55]. It uses elastic network methodology (ENM) and represents the system in residue level. Each node is represented by Ca atom of a residue and the overall potential is the sum of the harmonic potential between the interacting nodes. Each pair of residues separated less than a cutoff distance is assumed to be connected by harmonic spring. Information about the orientation of each interaction with respect to the global coordinate system is considered within the Force constant matrix and allows prediction of anisotropic motions.
A Hessian matrix can describe the force constant of the system. If the native structure corresponds to the energy minimum, then we can expand the potential energy in the Taylor series in terms of small positional deviations of residues DR from the mean equilibrium values R O For small deviations one can neglect terms higher than the second. Taking V(R O ) = 0, and noticing that the first derivative is equal to zero, we can rewrite the above equation as: Where H is the matrix of second derivatives. If the distance between two residues i and j is less than or equal to a cutoff value (in this study, we maintained the cut-off distance (r c ) between the Ca atoms as 15 Å ), then we can calculate H 3i+k, 3j+l (K, l = 1-3) by applying harmonic potential in the computation of the second derivative; otherwise it is set to 0.
The Hamiltonian of the system can be written as:

MDRzHDR~0 ð3Þ
From the assumption of the harmonic nature of oscillations, we can express the Hessian as: Where U is a matrix of eigenvectors, and D is a diagonal matrix of eigenvalues.
3N-6 non-zero eigenvalues can be calculated according to: Where l i are the eigenvalues of H sorted by their size from small to large and U i the corresponding eigenvectors. The eigenvectors describe the vibrational directions and relative amplitude in the different modes.
The mean square fluctuation of individual residues can be obtained by summing the fluctuations in the individual modes as follows: ProDy and the ANM were used to obtain simulated functional motions from the IkB structures [52,53,55].

Domain organization and superimposition of IkBs
The domain organization of each IkB protein family member is shown in Figure S1A. All IkB proteins possess common domain architecture, such as an N-terminal region followed by a Cterminal ARD. Proteins in the NF-kB superfamily of transcription factors are related through a highly conserved N-terminal RHD, which contains sequences required for DNA binding, dimerization, and nuclear localization [56,57]. In most cases, NF-kB proteins have C-terminal IkB-like inhibitory domains consisting of ANK repeats, which must be removed for their activation.
The number of ARDs varies among these IkBs, which may be important for determining their binding specificity. In cytoplasmic IkBs (IkBa, IkBb and IkBe), the N-terminal regulatory region contains phosphorylation and ubiquitination sites for signaldependent degradation and nuclear export signals that contribute to their observed cytoplasmic localization. Subsequently, the Cterminal ARD plays a pivotal role in their physical interaction with Rel/NF-kB subunits, and thereby modulates the cellular responses [2,17]. The C-terminal region to the ARD in cytoplasmic IkBs is rich in proline, glutamic acid, serine, and threonine (PEST) residues, forming a so-called PEST motif, which is indispensable for interactions with the NF-kB dimer and its subsequent removal from DNA [58]. The PEST motif is common among proteins that display a rapid turnover in cells [59]. To check whether all IkB family members contain the PEST motif, we utilized ePESTfind from the EMBOSS package [33,34]. We found that the cytoplasmic IkBa, IkBb, Relish, Cactus and Bcl3 proteins contain a PEST motif at their C-terminal end, whereas IkBf and IkBNS possess a PEST motif in their N-terminal region, and the remaining IkBs do not possess any PEST motif. Unlike cytoplasmic IkB PEST motifs, the PEST motif of nuclear Bcl3 undergoes phosphorylation of its serine residues during posttranslational modification, causing it to have a function similar to that of cytoplasmic IkBs [2,60,61,62]. No functional studies of the N-terminal PEST motif are currently available; hence, future biochemical studies are required to clarify its function.
Generally, the ANK repeat is a 33-amino acid consensus sequence motif in our constructed models. This motif forms 2 antiparallel a-helices, followed by a loop of variable length at a right angle. Each repeat begins and ends with short hairpin turns that protrude away from the a-helix. This non-globular fold is stabilized by intra-and inter-repeat hydrophobic interactions. In Figures S1B and C, we show the structural superimposition of typical IkB proteins (IkBa, IkBb, IkBe, IkBf, IkBNS and Bcl3) and IkB-like domain containing proteins (Cactus, Relish, NF-kB1 and NF-kB2) separately. The structural superimposition of IkBs (Figures S1B and C) show that the regions between the first, fourth and fifth ANK repeats are oriented differently and do not align with those in the rest of the IkB members. The major deviation among the IkB members mainly occurs in the finger loop region of ANK1, ANK3, ANK4, ANK5 and ANK6 (Figures S1B and C, marked with a star). Additionally, minor deviations were observed within and between the ANK repeats. This shows that structural variation in the finger loop region might be responsible for the binding specificity of IkBs. This hypothesis is supported by our PCA analysis of the ARD, as described below, which showed that all ARD-containing proteins were structurally similar and major conformational changes took place only in the finger loop region, indicating that it is primarily responsible for their functional divergence.

Identification and sequence analysis of IkB family members
Biochemically characterized IkB representative sequences were retrieved from the NCBI database. We used these sequences as queries to search other non-redundant (nr) databases (detailed in the Materials and Methods section) and identified 386 homologous proteins in vertebrates and invertebrates. All-against-all pairwise sequence alignments were carried out for all 386 sequences to determine the inter-and intra-similarity across the whole IkB family. Inter-subfamily similarity values varied from 4% to 82% and intra-subfamily similarity values ranged from 12% to 99.7%. The sequence identity within the IkB family members ranged from 4% to 99.7% ( Table 1), suggesting that the gene duplications that gave rise to them are relatively ancient. Due to these ancient genome duplication events, we observed several IkB homologs in different species.
In order to identify the similarities or variations among the IkB ARDs, we took representative sequences from each clade and performed multiple sequence alignment (MSA) by using the MAFFT (rapid MSA tool based on Fast Fourier Transform) software [35]. We observed that large gaps or improper alignments were mainly found in the N-terminal region (data not shown); therefore, we trimmed off the N-terminal region and used only the C-terminal ARD for further MSA (Figure 1). In this analysis, we observed the following large insertions between or within the ANK repeats: (i) IkBb contains a 41-residue insertion between ANK3 and 4; (ii) Cactus contains a 31-residue insertion between ANK3 and 4; (iii) IkBf contains a 27-residue insertion within ANK4; (iv) Relish contains a 13-residue insertion within ANK 1; and (v) IkBNS contains a 20-residue insertion within ANK4. In addition to the MSA of IkB representative sequences, an alignment of all IkB homologs was conducted, which clearly showed that the functional regions of each IkB protein were highly conserved across species, indicating similar roles in different species (data not shown).

Phylogenetic analysis: 3 robust IkB protein subfamilies
The available IkB sequences were retrieved from the major sequence databases. Querying major databases with the full-length representative sequences from the 10 IkB subfamilies identified 545 homologous proteins in vertebrates and invertebrates. On the basis of the filtering criteria, a few sequences were discarded and the final dataset included 340 sequences from 111 species that were subjected to phylogenetic tree reconstruction (Table S1). The final dataset (340 sequences) included 64 IkBa, 35 IkBb, 36 IkBe,  25 Bcl3, 32 IkBNS, 38 IkBf, 14 Cactus, 25 Relish, 36 NF-kB1 and 34 NF-kB2 sequences. To explore the phylogenetic relationships among the IkBs, we constructed a rooted tree by utilizing the neighbor joining (NJ) and Bayesian methods for the final dataset derived from 111 species. The results obtained from these 2 phylogenetic methods produced similar tree topologies.
In the phylogenetic tree reconstruction for all of the IkB sequences (328 sequences), the sponge Amphimedon queenslandica (GenBank ID: XM_003387518.1) was considered as an outgroup. This species has 2 protein sequences with ANK repeats that appear to be IkB proteins; the C-terminal repeats of its NF-kB protein are approximately 40% identical to those of human NF-kB proteins, and the ANK repeats of another gene are also quite similar (40-45% identity) to human IkBa and Bcl3 as well as to Nematostella vectensis IkB and Bcl3 [57]. The localization of the last internal branch of the NJ-distance tree with bootstrap values above 70% allowed 5 major clades or clusters among the IkB proteins to be distinguished in the first approach in addition to the A. queenslandica outgroup, each of which is shown in a unique color ( Figure 2). These major clades included (i) Relish with NF-kB1 and NF-kB2 (90.6% bootstrap value); (ii) Cactus with IkBa (83.9% bootstrap value); (iii) IkBe (100% bootstrap value); (iv) IkBb (91% bootstrap value); and (v) Bcl3 with IkBf and IkBNS (nuclear IkB proteins; 70.8% bootstrap value).
It is interesting to study the topologies inside these major clades. According to the distance tree, clade I consisted of 2 groups/ subfamilies, i.e., the Relish group and the NF-kB1 and NF-kB2 group. Our distance analysis clearly clustered Relish with the other IkB proteins and identified it as an early offshoot, thereby suggesting that Relish is the ancestor of the IkB family. Moreover, since Relish clustered with the NF-kB paralogs, it can represent a direct arthropod homolog of the NF-kB genes. The second group represents typical paralogous genes, NF-kB1 and NF-kB2 that descended from the duplication of a unique ancestral gene. In clade II of the distance tree, the Cactus gene represents the homolog of the IkBa gene. Cactus also represents a direct arthropod homolog of the IkBa genes in a similar manner to Relish. The branching order of the IkBa orthologs was well supported by the bootstrap values and fits well with the evolution of the species (Osteichthyes , Amphibia , Reptilia , Aves , Mammalia). However, this method separated IkBe and IkBb into 2 distinct clades (clades III and IV). Moreover, it remains unclear from the phylogenetic tree whether these 2 subfamilies are more or less closely related to any other member of the IkB family. Clade V constitutes the nuclear IkB proteins that were clustered into 2 groups, among which Bcl3 forms the first group and IkBf and IkBNS form the second group. From clade V, it is apparent that the IkBNS gene represents the direct homolog of the IkBf gene. Moreover, it is clear that vertebrate-specific gene duplication gave rise to IkBNS and IkBf, which are linked together with a 100% bootstrap value. The overall topology of the whole IkB phylogeny tree was well supported by high bootstrap values ranging from 70% to 100%. Additionally, the branching order of the IkB subfamily members fits well with the evolution of the species.
In order to validate the clustering organization of the IkB family members (Figure 2), we constructed IkB vertebrate (297 sequences) and invertebrate (43 sequences) phylogenetic trees ( Figures 3A  and B). Interestingly, these 2 trees produced a clade organization identical to that of the whole IkB phylogeny tree. In both IkB vertebrate and invertebrate tree reconstructions, N. vectensis Bcl3 was considered as an outgroup. Additionally, all of the subclades in the vertebrate and invertebrate phylogenetic trees had significant bootstrap values ranging from 70% to 100%.
The evolution of the IkB genes in each of these clades recapitulates the phylogeny of the species. The Relish and Cactus genes were only present in invertebrates (Arthropod and Mollusca lineages), whereas the NF-kB1 and Bcl3 genes were present in vertebrate and invertebrate lineages. Remarkably, N. vectensis, the lowest invertebrate species, appears to have a genuine Bcl3 gene. The sporadic appearance of Bcl3 throughout evolution may be due to its distinct properties among the IkB protein family [57]. Except for IkBe, the other IkB genes, including NF-kB2, IkBa, IkBb, IkBNS and IkBf are only present in vertebrate lineages. During sequence searching, we identified one invertebrate nematode species (Trichinella spiralis; NCBI accession ID XP_003377889) in the IkBe subfamily. However, we did not include this sequence in our analysis due to our length filtering criteria. Most of the vertebrates examined have exactly one gene orthologous to each of the IkBs. However, there are a few exceptions in Aves, which lack the IkBb and Bcl3 genes; Reptilia, which lack the NF-kB1 gene; and Amphibia, which lack the IkBb gene. Taken together, these observations clearly indicate that the 8 mammalian IkB proteins arose due to the requirements for IkB proteins with distinct affinities for different NF-kB transcriptional regulatory processes. From our data, it is clear that most of the mammalian IkB genes have been duplicated, and that the copies diverged from each other prior to the divergence of the earliest mammalian lineage. Some of the non-mammalian vertebrates appear to have phylogenetic affinity with some of the mammalian lineages, with significant bootstrap support. This is particularly evident for Figure 1. Sequence comparison of IkB ARDs. MSA of the ARDs from the representative IkB family members along with the Bcl3 ARD crystal structure. The amino acid numbers corresponding to the ARD regions for each representative sequence are shown beside each IkB protein name. The residues involved in type I and II divergence are marked in green 7-point stars and purple circles, respectively. The highly conserved regions in the sequence alignment of IkB ARDs are represented in blocks. At the top of the sequence alignment, the secondary structure prediction in relation to the structure of Bcl3 is shown. doi:10.1371/journal.pone.0054178.g001 Figure 2. Phylogenetic relationships among all IkB family members determined using the NJ method. A total of 328 protein sequences were included in this analysis. Bootstrap scores higher than 70% have been provided. The sponge Amphimedon queenslandica was considered as an outgroup. The clustering of IkB family members into 5 major clades is shown. Each IkB member is represented by a unique color in the phylogenetic tree: IkBa (magenta), IkBb (red), IkBe (cyan), IkBNS (dark green), IkBf (brown), Bcl3 (purple), Cactus (orange), Relish (black), NF-kB1 (blue) and NF-kB2 (light green). Taxa terminologies are presented as the IkB protein name followed by an abbreviated form of the species name. Please refer to the Results section and Table S1 for their description and species names, respectively. doi:10.1371/journal.pone.0054178.g002 several proteins from Aves, Amphibia, Reptilia, and Osteichthyes (fishes) that tend to branch with their mammalian orthologs. Taken together, these observations clearly indicate that many of the IkB-like genes must have duplicated prior to the Mammalia-Aves, Mammalia-Amphibia, Mammalia-Reptilia, and Mammalia-Osteichthyes divergences. Finally, non-mammalian genomes contain noticeably fewer IkB subfamily members when compared with mammalian genomes, indicating that the mammalian genome utilizes multiple NF-kB transcriptional regulatory processes.
In conclusion, the IkB family can be divided into 3 robust subfamilies according to their structural, domain and clade organization: (i) Relish, NF-kB1 and NF-kB2 proteins, which are characterized by the presence of an RHD in their N-terminal regions and ANK repeats in their C-terminal regions; (ii) Cactus, IkBa, IkBb and IkBe proteins, which are characterized by the presence of 6-7 ANK repeats; and (iii) the inducible nuclear IkB proteins IkBf, IkBNS and Bcl3, which are characterized by the presence of 7 ANK repeats ( Figure S1A). Our current phylogenetic analysis using different methodologies suggests that the IkB subfamilies might have diverged and been duplicated from a unique ancestral set of ANK repeats that had acquired an RHD, i.e., Relish before any duplication. Further, this analysis identified Relish as the earliest lineage and the presence of only few paralogous genes (NF-kB1 and NF-kB2; IkBf and IkBNS) within the IkB subfamilies.
Sites contributing to type I and II functional divergence among the IkB subfamilies Gene duplications provide a means to develop novel biological functions, and changes in protein function may then cause different functional constraints on the subsequent evolution of the duplicated genes. Generally, the functional divergence of a protein family occurs after major evolutionary events such as speciation or gene duplication [63]. In order to estimate the relationship between gene evolution and the functional divergence of the IkB protein family, we conducted pairwise functional divergence analysis between the IkB genes by using DIVERGE 2.0 [48,49]. In this type I functional analysis, we considered only the C-terminal ANK repeat region from 10 IkB subfamilies, which were consequently subjected to a posteriori analysis. Since the number of sequences per subfamily was quite large, we grouped a minimum of 4 sequences in each clade by both subfamily and taxonomic class, as required by DIVERGE. Using MSA (with a minimum of 4 sequences per subfamily for a total of 51 sequences), maximum likelihood tree topology, and the IkBa crystal structure, we determined the evolutionary rates of functional divergence of the IkBs using DIVERGE. The coefficient of evolutionary functional divergence (h), its standard error, and the maximum likelihood ratio test (LRT) were determined for each pairwise comparison ( Table 2). These 10 subfamily groups resulted in 45 pairwise group comparisons, of which 13 comparisons demonstrated statistically significant divergence (values shaded in gray in Table 2; LRT, p,0.05). Additionally, the type I functional divergence analysis showed medium to high h I values between all IkB pairwise comparisons, except IkBb/NF-kB1 ( Table 2). The h I values were .0 and statistically significant at the 1% level in accordance to LRT, thereby providing solid evidence of type I functional divergence between the IkB subfamilies. In order to identify the amino acid sites that might be involved in the functional divergence of the IkB family members, the significant values of h I were compared using a posteriori probability analysis with a suitable cut-off value. A site that showed h I .0.85 was thought to be a potential type I site. A total of 18 potential type I sites were identified in all pairwise comparisons. All of the predicted functional amino acid sites were found to be clustered among all of the ARDs of the IkBa subfamily member (Figures 1,  4A and B). We provided the residue positions based on the human IkBa reference sequence. Furthermore, the results show that the amino acid residues that are critical for functional divergence are located predominantly in the finger loop regions, but a few are also present in the helical region (Figures 1, 4A and B). It is noteworthy that the IkB finger loop regions mediate important interactions with different NF-kB subunits, thereby modulating the transcriptional process.
Amino acid residues with radical biochemical changes between the IkB subfamilies were identified via type II functional analysis. We did not perform IkB pairwise comparisons among all IkB subfamilies but performed pairwise comparisons within 8 vertebrate IkB subfamilies. We grouped a minimum of 4 sequences in each clade by both subfamily and taxonomic class (1 or 2 representative sequences for a total of 41 sequences), as required by DIVERGE. In contrast to the significant estimates of h I , no evidence for type II functional divergence was observed (data not shown) between any of the pairwise groups with extremely small h II values. Notably, most of the residues received very low scores, indicating that only a small portion of the amino acid residues have been involved in this type of functional divergence. Although there was no clear evidence for type II functional divergence (p.0.05), we performed a further analysis to determine whether any potential sites show evidence for type II functional divergence. We supposed that if the a posteriori ratio test value of an amino acid site was .17, then it could be considered as a potential type II site. Thus, 33 amino acid sites show a typical shift of amino acid properties at conserved residues (Figures 1 and 4C and Table 3). Most of these sites were distributed throughout the finger loop regions. The largest number of radical changes identified in type II functional analysis was found in the comparison between IkBa/ IkBe, IkBa/IkBNS, and IkBa/IkBf. The highest site-specific posterior probabilities were identified in the IkBa/IkBf and IkBa/IkBNS pairwise comparisons, suggesting that the binding specificity toward NF-kB subunits/partners may be different between cytoplasmic and nuclear IkB proteins. The results from the analysis of type I and type II functional divergence (Tables 2 and 3) suggested that the IkB genes should be significantly functionally divergent, due to the differences in their evolutionary rates and amino acid properties at specific sites. Hence, the functional divergence of these proteins possibly reflects the existence of long-term selective pressure.

Selective pressure at amino acid sites in the IkB family members
In order to test for the presence of positive selection at individual amino acid codons, we used the Selecton server [50,51]. We estimated the number of synonymous (Ks) and nonsynonymous (Ka) nucleotide substitutions per site for all of the IkB subfamily members. Likelihood rate tests were performed between model M8 (positive selection enabled evolutionary model, beta + v$1) and M8a (null model with no positive selection, beta + v = 1) on IkB sequences. The M8 evolutionary model showed that the overall Ka/Ks ratio was ,1 for all IkB subfamily members, except IkBb, IkBNS, Relish and NF-kB1, indicating that the IkB proteins have undergone strong purifying selection pressure (data not shown). Moreover, the M8a null model showed a statistical significance for all IkB subfamily members. The major site-specific positive selection amino acids were: IkBb: 333S and 337Q; IkBNS: 8V; Relish: 270D, 473N, 480K, 485A, 487P, G506, 510H, 550A, 556R, and 575T; and NF-kB1: 435M, 477G, 482T, 488G, 493S, and 494A. Interestingly, the sites that underwent positive selection were located outside the N-and Cterminal ARD regions, thereby indicating that the IkB ARDs have undergone strong purifying selection, suggesting their importance for the function of IkB proteins.
In addition to the above-mentioned sequence and phylogenetic analyses of the IkB family, we conducted detailed structural studies of the IkB ARDs to identify the differences responsible for their functional divergence. Firstly, we analyzed ARD crystal structures and identified the inter-conformation relationships. Secondly, we modeled the IkB structures and identified their domain motions by utilizing an anisotropic network model (ANM).

PCA and ANM analysis of solved ARD protein structures
Since our structural analysis focused on the ARDs of IkBs, we searched for similar types of ARD structures by using the RCSB Protein Data Bank (PDB). The complete list of 71 PDB identifiers can be found in the supporting information (Table S2). The objective of our PCA analysis of solved structures containing ARDs was to compare the functional variations in structures observed experimentally with those predicted from an ANM (physical theory and method) based on native contact topology. Figure 5A shows the projection of ARD structures onto the subspace spanned by the 2 principal axes PC1 and PC2. The points therein represent the distribution of 71 ARD structures. However, the ensemble represents 227 conformations because some of the structures in the dataset (Table S2) were represented by multiple NMR models. Each model was added to the ensemble as an individual conformation.
PC1 and PC2 were found to account for 62% of the total variance in structure. The projection of ARD structures onto PC1 and PC2 showed a clear separation of structures into 4 clusters according to the number of ANK repeats and function. Cluster 1 contained 3 structures, i.e., 3NOG, 3NOC, and 2J8S, which have 5 ANK repeats and function as signal transducers. Cluster 2 contained only a single structure, 1N11, which has 12 ANK repeats and acts as an ion transporter. Similarly, cluster 4 contained only a single structure, 1YCS, which has 4 ANK repeats and acts as a cell cycle regulator. However, cluster 3 contained 66 structures (including IkB protein structures), which have 6-11 ANK repeats and function as either transcriptional initiators or regulators. It should be noted that all of the IkB family proteins exhibited similar conformations with an RMSD of ,5 Å , whereas the other ARD structures showed conformational changes with an RMSD ranging from 4 to 30 Å while binding to other proteins, as shown in the inset of Figure 5A.
The structural variations among the 4 clusters (PDB IDs: 1N11, 1YCS, 1K1A, and 3NOG) are shown in Figure 5B. Major conformational variations are clearly observed in the finger loop regions ( Figure 5B, marked with black dotted circles), which are primarily responsible for the functional divergence among the ARD proteins. The correlations between the lowest 3 ANM modes for the Bcl3 structure (representing IkBs) and the top-ranking 3 PC directions are listed in Table 4. Among them, ANM1 had a correlation of 0.22 with PC1. The projections of the ensemble of structures onto ANM1 and PC1 yielded a correlation of 0.64 ( Figure 5C). This correspondence underscores the robustness of low-frequency modes and demonstrates the functional importance of the ARD in Bcl3 (and IkBs), among other ARD functions. Overall, these results show that even though these ARD domains are from diverged families, they form the same folded structure, and the variations are mainly observed in the number of ANK repeats and in the finger loop regions, which are primarily responsible for the functional divergence among ARD-containing proteins.

Homology modeling of IkB proteins
To date, the crystal structures of 2 cytoplasmic IkBs (IkBa and IkBb) and 1 nuclear IkB protein (Bcl3) have been solved [60,62,64]. We built the remaining IkB protein structures (IkBe, IkBf, IkBNS, Cactus, Relish, NF-kB1, and NF-kB2) using comparative modeling. On the basis of sequence identity, the nuclear IkB protein Bcl3 is a suitable template for IkBNS, NF-kB1, NF-kB2, and IkBf 3D modeling, whereas for IkBe, both IkBb and Bcl3 can serve as templates. Furthermore, IkBa (1IKN) serves as a suitable template for Cactus, while another ARD, 1N11, serves as a suitable template for Relish. During targettemplate alignments of IkBNS, IkBf, Cactus and Relish, we identified insertions (20,27,31, and 14 amino acids in length, respectively), which did not align with the template; however, we did not delete these insertion regions when building the models ( Figure S2). Model evaluation involves the analysis of the geometry, stereochemistry and energy distribution of the final models. The evaluation listed in Table 5 indicated that all of the models were of high quality in terms of overall packing. The final structure of Cactus contained 6 ANK repeats, whereas those of IkBNS, IkBe, IkBf, Relish, NF-kB1 and NF-kB2 contained 7 Sites of radical change were detected in pairwise type II functional analysis of vertebrate IkB taxonomic groups. The residue positions are provided based on the human IkBa reference sequence. The amino acid changes between subfamilies along with their property changes are provided. + represents positively charged amino acids; 2 represents negatively charged amino acids; *represents the residue position based on the human IkBa reference sequence; # represents the residue position in the full vertebrate IkB alignment. doi:10.1371/journal.pone.0054178.t003 ANK repeats. All of the predicted structures were in agreement with the secondary structure predictions made using PSIPRED [65]. In order to identify the domain motion for each IkB member, 6 IkB models (IkBa, IkBb, IkBe, IkBNS, IkBf and Bcl3) were subjected to ANM analysis.

Conformational flexibility of the IkB proteins
The protein peptide backbone and side chains exist in a state of constant motion; these internal local movements are quite fast, i.e., in the femto-to pico-second time scales [66]. In a previous study, we conducted molecular dynamics (MD) simulations (15 ns) for most of the IkB proteins [2]. We found that the proteins reach equilibrium after 3 or 4 ns and remain stable thereafter, without undergoing any major conformational changes. We then extended our MD simulations to 50 ns and noticed that the proteins reached an equilibrium state after 4 ns and remained stable until 40 ns, after which they entered a metastable state and then reached a second equilibrium state after 45 ns (manuscript in preparation, data not shown). These results clearly suggest that IkB proteins require a slower conformational transition, involving medium-to large-scale motions (on a microsecond scale). In order to identify the mobility occurring within the ANK repeat, we utilized ANM analysis. The rationale for using NMA, and, in particular, of using elastic network models (ENMs) over traditional methods such as MD, was based on its fast computational time and ability to predict large-scale fluctuations and global motion. The IkB protein family and the finger loop regions within the ARDs play an important role in binding specificity. The ANM was expected to provide significant information about these regions.
Fluctuations predicted by the ANM correlate well with the experimental data Among the 6 IkB structures, 3 were taken from the crystallographic coordinates. We analyzed the functional motion of these structures using the ANM. Importantly, the crystal structures can reproduce experimental data, e.g., temperature, and also provide cooperative, global protein motions at low frequencies. A comparison of the calculated (theoretical) and experimental fluctuations shown in Figure 6 indicates that there is qualitative agreement between the predicted mean square fluctuation and the experimental B-factors. This analysis shows that the ANM is successful in reproducing the experimental results. Apart from the crystal structures, we predicted the theoretical fluctuations for the modeled proteins. In the fluctuation plots, the maxima indicate the areas of the protein structure with high flexibility, whereas the minima correspond to regions with restricted flexibility. The highly flexible regions generally coincided with the positions of ANK1, ANK6, or ANK7 and the finger loop regions, whereas the low mobility areas correspond to residues in the helical segment.

Direction of motions
We analyzed the direction of motion for the first 3 lowfrequency modes of each IkB protein. The mobile regions identified for IkB-a, -b, -e, and -f, and the Bcl3 proteins were relatively similar to the flexible residues identified from MD simulation studies [2]. In most cases, domain movement correlated well with the experimental observations.
In the case of IkBa, large domain movement was observed at the C-terminal end, which encompasses the PEST motif. This region plays a major role in disrupting the p50/p65-DNA complex and retaining the IkBa-p50/p65 complex in the cytoplasm [58]. At the N-terminus, we noticed a few flexible regions in areas that are important for masking the NLS of the p50 and p65 subunits [60,61]. Apart from these regions, fluctuating residues were also observed in the third finger loop region, the function of which remains unknown (Figures 7, 8, and 9 and Figure S3). The domain motion identified in IkBb was similar to that in IkBa (Figures 7, 8  and 9 and Figure S3). The function of the N-terminal region is to mask the NLS of the p65/p65 homodimer [62]. Although the crystal structure of IkBb bound to the p65/p65 homodimer is available, and is largely similar to that of IkBa, the high-mobility PEST region projects away from the protein-protein surface, clearly indicating that it has a different function. Recent biochemical studies have shown that the S313 and S315 residues in the PEST motif undergo phosphorylation and interact functionally with the cytoplasmic inhibitor c-Rel [67]. Apart from these, the fifth finger loop region also showed mobility; however, this region does not participate in the interface. Among the cytoplasmic IkBs, IkBe contains 7 ANK repeats, in common with nuclear IkBs; however, its dynamic motion was similar to that of the above-described cytoplasmic IkBs (Figures 7, 8 and 9 and Figure S3). The function of the N-terminal region is to mask the NLS of the p50 and p65 subunits, while the C-terminal region has high mobility and contains negatively charged amino acids that probably play a similar role to those in IkBa, with delayed kinetics [2,68].
IkBNS exhibited motion at the N-terminus and in the Cterminal finger loop regions (Figures 7, 8, and 9 and Figure S3), which play an important role in interacting with the p50/p50 dimerization interface [2]. Unexpectedly, we noted loop movement within ANK4; further biochemical studies are required to clarify the function of this region. Among the nuclear IkBs, IkBf possesses numerous functions, but the motion predicted by the current study was in the N-terminal and C-terminal regions. The N-terminus contributes to the binding of the p50 and p65 subunits, whereas the C-terminal regions are known to be involved in binding the p50/p50 homodimer [2]. Moreover, the theoretical Bfactor identified 2 additional regions exhibiting motion in the fourth and fifth finger loops ( Figure 6). Of these, the fourth finger loop region is responsible for its interaction with the p50 and p65 subunits, whereas the fifth finger loop region contributes to its interaction with the p50/p50 homodimer. In the case of Bcl3, the function of the identified N-terminal cluster remains largely unknown, whereas the C-terminal mobility region is important for its binding with p50 subunits and the subsequent removal of the Bcl3-p50/p50 complex from DNA (Figures 7, 8, and 9 and Figure  S3), resulting in the binding of an active NF-kB dimer to DNA [2]. Generally, nuclear IkBs are produced along with the products of the primary response genes.
NMA showed a similar pattern of domain motion for all IkB family members, although the residues involved in the motion of the domains are not evolutionarily conserved. However, we identified 2 amino acids (D73 and D75) in the IkBa N-terminal region whose corresponding positions were highly conserved among IkB members. In the case of cytoplasmic IkBs, these amino acid side chains point toward the NLS of the NF-kB subunits, thereby retaining the IkB/NF-kB complex in the cytoplasm [2,6,60,61,62]. However, in nuclear IkBs, these acidic residues point in the opposite direction, indicating that nuclear proteins do not have any functional influence over the NLS [2]. This clearly shows that conformational changes take place in these protein structures that keep the side chains of particular amino acids in an appropriate position, and are thus responsible for specific biological functions.

Discussion
A wide variety of microbial components activate NF-kB, which plays an essential role in the optimal activation of the host immune system. The transcriptional activity of NF-kB is tightly regulated at multiple steps of the immune signaling pathways because its excessive activation is harmful to the host [16]. The activity of NF-kB is primarily controlled by a family of regulatory proteins called inhibitory IkB proteins. We have previously investigated the structural basis for the activation and inhibition of IkB proteins [2,38]. However, in our current study, we performed a comprehensive bioinformatics analysis of the entire IkB family and identified their evolutionary, structural and functional relation- ships. Here, we have shown that the IkB family is much larger than previously anticipated [28] and it is composed of 11 members, namely, Relish, NF-kB1, NF-kB2, Cactus, IkBa, IkBb, IkBc, IkBe, IkBf, IkBNS and Bcl3. Even though IkBc is an IkB protein, we have not included it in our analysis since IkBc corresponds to the C-terminal part of NF-kB1, which contains ANK repeats that play a potent role in auto-inhibition and is eliminated during maturation. On the basis of our phylogenetic analysis, we identified Relish as the ancestor of IkB family members, which is in agreement with previous reports [8,28]. Our phylogeny analysis organized the IkB subfamily members into 5 distinct clusters (Figure 2). In cluster 1, NF-kB1/NF-kB2 subclades were first separated from the ancient Relish subfamily by gene duplication. Relish, NF-kB1 and NF-kB2 genes contain 7 ANK repeats in their C-terminal regions and an RHD in their N-terminal regions. In addition to the identification of Relish as an early offshoot, the presence of both RHD and ARD suggests that the Relish genes may have diverged before the NF-kB genes as a result of an early gene duplication, which is in agreement with the findings of Huguet et al. [28]. IkBa and Cactus (Drosophila homolog) genes contain 6 ANK repeats representing the second cluster. Even though we can hypothesize that Cactus/IkBa represent paralogous genes from the phylogenetic tree, this is not possible since we were unable to identify a Cactus gene in vertebrates. Hence, it is clear that no vertebratespecific gene duplication has given rise to a paralogous version of the IkBa gene, but the discovery of a new paralog is still possible. Clusters 3 and 4 represent the IkBe and IkBb subfamilies; however, the relationship of these subtypes with other IkB family members remains unclear. Yet, their structures (the presence of 6-7 ANK repeats) and regulation are very close to those of IkBa; hence, these IkBs fall into the cytoplasmic IkB proteins. Cluster 5 is represented by nuclear IkB proteins such as Bcl3, IkBf and IkBNS. Unlike Cactus/IkBa, vertebrate-specific gene duplication has given rise to the paralogous genes IkBNS and IkBf.
Our phylogenetic analysis suggests that (a) the acquisition of the RHD in an unknown ancestor containing the ANK motif gave rise to the IkB genes (Relish, NF-kB1 and NF-kB2) and that (b) at a later evolutionary stage, a few unknown IkB genes lost their RHD and evolved into different IkB subfamilies such as Cactus, IkBa, b, e, f, IkBNS and Bcl3. This is in line with previous studies demonstrating 2 models of evolution for the IkB proteins [28,57].   The ANK domain appears to be more ancient than the RHD due to the presence of ANK repeat proteins in bacteria; hence, it is certain that this protein-protein interaction domain existed long before the development of the NF-kB pathway [57]. Considering all of these observations, we believe that the 8 IkB proteins in mammals arose due to the requirements of IkBs for different NF-kB/Rel complexes, different biological functions such as coactivators or inhibitors of NF-kB/Rel complexes, and differential tissue expression patterns. Overall, our phylogenetic data indicate that the IkB subfamily members evolved by several gene duplication events, divergence and acquisition or scission events in specific lineages, which is in agreement with the previous finding that large-scale gene duplications have occurred during chordate evolution [69,70,71,72]. Thus, our phylogenetic analysis is conclusive about the relationships of the different IkB proteins.
Processes such as pseudogene formation [73], subfunctionalization [74], and neofunctionalization [75] after gene duplication may result in altered functional constraints between members of a gene family. In order to identify the functional divergence among the IkB subfamilies, we performed intensive type I and II functional divergence analyses (Figures 1 and 4). These analyses showed that type I rather than type II functional divergence is the main pattern for the functional divergence of the IkB subfamilies (Tables 2 and 3). In this study, the divergence of amino acid residues among the different IkB subfamilies provided us with an indication that the IkB genes may have diverse physiological functions and partner binding specificities, which is in agreement with the findings of several biochemical and structural studies [2,6,18,19,20,21,22,23,25,26,27,38,61,62]. To further characterize the relationship between the site-specific evolution of amino acids and functional divergence, the potential amino acid sites related to positive selection and type I and II functional divergence were selected and mapped on to the reference IkBa structure as well as the sequence alignment. The analysis of type I and II functional divergence showed that nearly three-quarters of the functional divergence sites are distributed throughout the ARD finger loop regions, but a few are also present in the helical region, suggesting that IkB genes are functionally divergent and exhibit diverse binding specificities with NF-kB/Rel subunits (Figures 1  and 4). However, in the analysis of selective pressure for the IkB genes, site-specific positive selection amino acids were not identified in the functional ARD region but rather they were distributed in the outer regions of the ARDs, suggesting that the IkB proteins (ARDs) have undergone strong purifying selection and are functionally significant.
In addition to these evolutionary and functional analyses, we were also interested in exploring the differences accountable for the functional divergence in IkBs; hence, structural studies of the IkB ARDs were conducted. Our PCA analysis of ARD structures ( Figure 5) correlated well with the structural superimposition studies demonstrating that even though the structures are similar, major variations took place primarily in the number of ANK repeats and in the finger loop regions, which are primarily responsible for the observed functional divergence among IkB family members. In order to check for the functionally relevant motion and directionality of the IkBs, we utilized NMA. NMA has been successfully used in determining domain motions and their collective nature for various types of proteins, including hexokinases [76], RNA polymerases [77], lysozymes [78], citrate synthetase [79], adenylate kinase [80], and icosahedral virus capsid proteins [81]. Our NMA analysis identified mobile regions that are located on the loops connecting the regular secondary structural elements (Figures 7, 8, and 9 and Figure S3). Since the loops have a higher degree of freedom than the highly packed regions (i.e., helices), they are usually anticipated to have high mobility. Nevertheless, the slowest modes identified from our ENMs were highly specific; only certain loops showed large domain movement or motion, which suggests that these mobile loops may have functional roles in the interaction with different NF-kB/Rel subunits. Indeed, a few residues identified in the loop region contribute to the binding interface with NF-kB dimers, which are in agreement with previous biochemical and structural studies [2,6,18,19,20,21,22,23,25,26,27,38,61,62]. Further mutational or deletion experiments are required to identify which specific residue sites on the loops are essential for the functional specificity exhibited.
Moreover, it should be noted that although we did not observe any conserved residues contributing to domain motion across all of the IkB genes, we noticed some conspicuous conservation among cytoplasmic and nuclear IkBs. These conserved residues exhibiting domain motion are probably the driving force for binding with the same NF-kB units that were observed in a few cytoplasmic and nuclear IkBs. However, different IkB homologs have different degradation and resynthesis rates and thus regulate NF-kB activation with unique kinetics after a particular stimulus [1,82,83].
In conclusion, this study may help to understand the phylogeny and structural and functional divergence of IkB family members more clearly. Gene duplications, gene divergence, acquisition, or scission events of certain domains that depend on the evolutionary rate and purifying selection are the primary evolutionary forces involved in the generation of the IkB family. Structural and functional analyses support the hypothesis that the IkB proteins have evolved distinctive functional properties due to the variations in their ANK repeats, high fluctuations observed in their ARD finger loop regions, and the divergence of amino acid sequences among different IkB subfamilies.
Our current study represents the first thorough examination of the evolution, structure, and function of the IkB family. Our evolutionary analysis proposes that all IkB subfamilies have diverged and been duplicated from a single, common, ancestral RHD-containing ANK gene. We utilized 3D modeling to demonstrate that all the IkB proteins share similar structural folds and are likely to retain similar functions. However, in our structural and functional analyses, the divergence of amino acid sequences among different IkB subfamilies, variation in the number of ANK repeats, and higher flexibility in the finger loop regions provided us with an indication that the IkB family genes may have diverse physiological functions and distinct binding affinities for different NF-kB/Rel partners. Overall, our study demonstrates the utility of a multidisciplinary approach that combines insights from the evolutionary origins of IkBs with computational methods, leading to the prediction of structural and functional divergence that can be tested at the molecular level.