Intermolecular β-Strand Networks Avoid Hub Residues and Favor Low Interconnectedness: A Potential Protection Mechanism against Chain Dissociation upon Mutation

Altogether few protein oligomers undergo a conformational transition to a state that impairs their function and leads to diseases. But when it happens, the consequences are not harmless and the so-called conformational diseases pose serious public health problems. Notorious examples are the Alzheimer's disease and some cancers associated with a conformational change of the amyloid precursor protein (APP) and of the p53 tumor suppressor, respectively. The transition is linked with the propensity of β-strands to aggregate into amyloid fibers. Nevertheless, a huge number of protein oligomers associate chains via β-strand interactions (intermolecular β-strand interface) without ever evolving into fibers. We analyzed the layout of 1048 intermolecular β-strand interfaces looking for features that could provide the β-strands resistance to conformational transitions. The interfaces were reconstructed as networks with the residues as the nodes and the interactions between residues as the links. The networks followed an exponential decay degree distribution, implying an absence of hubs and nodes with few links. Such layout provides robustness to changes. Few links per nodes do not restrict the choices of amino acids capable of making an interface and maintain high sequence plasticity. Few links reduce the “bonding” cost of making an interface. Finally, few links moderate the vulnerability to amino acid mutation because it entails limited communication between the nodes. This confines the effects of a mutation to few residues instead of propagating them to many residues via hubs. We propose that intermolecular β-strand interfaces are organized in networks that tolerate amino acid mutation to avoid chain dissociation, the first step towards fiber formation. This is tested by looking at the intermolecular β-strand network of the p53 tetramer.


Introduction
There exist proteins which function as oligomers by associating several copies of the same chains (homo-oligomers) or of different chains (hetero-oligomers). Chain association takes place through the formation of protein interfaces involving interactions between atoms of the amino acids of adjacent chains. Such intermolecular amino acid interactions are extensively studied by both experimental and computational approaches [1][2][3][4][5]. Alanine scanning mutagenesis have showed that only some of the amino acids of the interface account for the binding free energy [6]. Thus, there exists a subset of amino acids at interfaces, referred to as ''hot spot'' amino acids which are relevant for the chain association. This discovery has led to ample computational tool development aimed at identifying hot spots. The amino acids essential for interface formation are now known colloquially as hot spots, without necessarily implying alanine scanning validations.
Among proteins, some have the fold plasticity to undergo a transition from one oligomeric state to another. Of particular interest are the cases where the new oligomeric state impairs the protein function and leads to pathologies called protein misfolding diseases or conformational diseases. This transition is responsible for severe human diseases such as Alzheimer (Ab-amyloid), Parkinson (synuclein) and cerebral amyloid angiopathy (cystatin C-amyloidosis). It is important to emphasize that the phenomenon is not restricted to neurodegenerative diseases but extends to cancer (p53), type II diabetes (IAPP, amylin), cardiovascular (transthyretin, serpin) and inflammatory diseases (Serpin) (reviewed in [7][8][9][10][11]). Note that in the previous sentence, for each of the diseases the protein undergoing the transition is indicated in brackets. A priori, these diseases are unrelated and the protein culprits do not share biological function, primary, secondary, tertiary or quaternary structures (initial or final). So the occurrence of the transition ought to be related to a local fold plasticity that allows transitions between different oligomeric states. It could be secondary structure plasticity as observed for the DIII loop of pore-forming toxins which becomes a b-hairpin and promotes the toxin's oligomerization or tertiary structure plasticity like the movement of the so-called ''hinge loop'' which leads to the formation of dimer or higher oligomeric states via a domain swapping mechanism [12][13][14][15].
The involvement of a local fold in the transition is in good agreement with the presence of a common structural motif in the pathological form of the culprit proteins. The pathological form, whether a fiber or an oligomer, involves interactions between two b-strands, each provided by a different chain (intermolecular bstrands). These intermolecular b-strands share several structural properties. They are recognized by the same antibody A11 [16]. Their formation depends on interactions between atoms of the backbone, result which has led to the proposal that aggregation is a generic property of the polypeptide chain [17,18]. They adopt a cross b structure which can be predicted from sequences by the PIRA (Parallel 'In Register' Arrangement) model, a network made of single pairs of residues [19][20][21][22][23][24]. Different predictors of the aggregation-prone sequences involved in the fiber formation are now available [25][26][27][28][29][30].
Nevertheless, intermolecular b-strands are common in protein oligomers that are not known to undergo a transition to pathological assemblies. This suggests that there is a protection mechanism that prevents some intermolecular b-strands from undergoing the transition. We are interested in identifying the features pertaining to the vulnerability of intermolecular b-strands to undergo a transition to pathological assemblies. The intermolecular b-strand interactions that occur in conformational diseases are often referred to as ''aberrant'' interactions because they lead to a loss of protein function and finally to the disease while the intermolecular b-strand interactions that occur in ''healthy'' protein oligomers are referred to as ''functional'' interactions.
Previous studies mainly in dimers have shown that the frequencies of individual amino acids in intermolecular b-strands and in intramolecular b-strands are not different [31]. Yet we have reported that intermolecular b-strands of oligomers of quaternary structures above dimer, have a scattered charge distribution in contrast to intramolecular b-strands and ''aberrant'' b-strands which have charges confined to their C-and N-terminal extremities [26,32,33]. Edge b-strands have charges centrally located which prevent their aggregation, explanation that holds for intermolecular b-strands as well [34]. In our study, the individual hot spots did not have any features that could account for a transition from ''functional'' to ''aberrant'' b-strand interactions. Because of the small size of the dataset (40 intermolecular bstrands), it was not possible to investigate the properties of the hot spot pairs or of the layout of the interactions between hot spots.
We have now built a larger dataset of 1048 intermolecular bstrands enabling us to explore such properties. The results show that the hot spots are not matched randomly but according to chemical and geometrical properties of the side chains of the amino acids. The role of the geometry is novel and might open new venues to apprehend how intermolecular b-strands are formed. The main result is that the interactions between hot spots are organized to resist to the effects of amino acid mutation, possibly avoiding in this way chain dissociation upon mutation, first step to fiber formation.

Results
The goal is to describe features of the hot spots involved in intermolecular b-strands and to consider how they may participate in a transition from ''functional'' to ''aberrant'' interactions. The intermolecular b-strands are represented as networks of hot spots in interaction with hot spots as nodes and interactions as links. Vocabulary related to graph and network theories are provided in methods. Closest atoms. For every atom of S 1 , Gemini chooses the closest atoms on S 2 (left picture) and for every atom of S 2 , Gemini chooses the closest atoms on S 1 (right picture). The closest atoms are encircled. C. Mutually closest atoms. Gemini selects the atoms mutually the closest. The amino acids to which the mutually closest atoms belong are indicated by big filled circles. R stands for residue and the subscript is the position of the amino acid on the sequence. D. Gemini graph of amino acids in interaction. The distances between amino acids in contact are now arbitrary fixed to the same value because the information on the ''real'' interatomic distances is now lost. The pair of residues R99 and R25 is a single pair of amino acids (k = 1, that is one link connecting two residues). The residue R96 is a multiple contact amino acid because it is involved in two single pairs one with R29 and the other with R27, respectively. doi:10.1371/journal.pone.0094745.g001 The tool Gemini The nodes and the links of the networks are identified by our tool Gemini. Gemini has been described previously, hence we only briefly recall how the networks are built [35,36]. Each chain of a protein oligomer is considered as a set of points in the space whose positions are the Cartesian coordinates (x, y, z) of the atoms of the chain. The coordinates can be downloaded from the PDB. The atoms of the chain 1 constitute the set 1 (S 1 ) and the atoms of the chain 2, the set 2 (S 2 ). Gemini calculates distances between every atom of S 1 and every atom of S 2 (interchain distances) but ignores the distances between atoms of a single set (intrachain distances) (Fig. 1A). Gemini chooses the closest atoms (Fig. 1B), and among them, retains only the pairs of mutually closest atoms (Fig. 1C). In other words, Gemini starts from an atom X 1 of S 1 and walks to its closest atom X 2 on S 2 . It checks when coming back to S 1 by the shortest distance that it retraces its step to X 1 . If not, the pair of atoms (X 1 , X 2 ) is discarded, as for example for the pair (A 1 , B 2 ) on figure 1C. The pairs of atoms that are mutually closest are considered to be interacting. At this stage the interchain interactions are symmetrical and the interface is referred to as around symmetrized [35]. In the last step, the pairs of atoms are replaced by their respective amino acids and a coarse-grained graph of amino acids in interaction is produced (Fig. 1D). Every amino acid has k interactions or k links where k equals to the number of atoms involved in a contact. There are single pairs of amino acids (k = 1, that is one link connecting two residues), multiple pairs of amino acids (k links connecting two residues) and multiple contact amino acids (an amino acid with k links to distinct amino acids).
Due to the choice of only mutually closest atoms, Gemini produces a graph of amino acids in interaction which is essentially a framework of interactions but not the set of all possible interactions. The amino acids selected by Gemini are detected as hot spots by available programs showing the robustness of defining an interface based only on geometry and its accuracy in picking up relevant amino acids [35]. It is important that Gemini does not need a cut-off distance to select atoms of the interface as classically done, for example to select preferentially backbone or side chain atoms. In this way Gemini avoids the variability of the selection inherent to the choice of a cut off [37]. Gemini naturally selects backbone and/or side chain atoms as part of the interface according to the geometry of the interface. Note that Gemini is applicable on any set of points in any metric space and can be used beyond the problem in question in the paper.

The dataset
The PDBs of 755 protein oligomers containing at least one intermolecular b-strand interface are extracted from the RCSB (Biological assembly) and in total 1048 intermolecular b-strand networks are constructed with Gemini. It is a non-redundant dataset of oligomers assembling three (trimer) to twelve subunits (dodecamers). The oligomers are selected only on the presence of intermolecular b-strands since we are looking for elements relevant to the formation of the interface but not to the formation of the whole chain. To fit that condition and alleviate the pressure of evolution due to fold or function similarities, we need a dataset with high diversities in terms of the features of the whole chains. The 755 protein oligomers classify into 234 SCOP families, 30 distinct functions, are produced by organisms from the three domains of life and have on average a full chain length of 2066140 amino acids (average 6 standard deviation) [38][39][40]. Now, on the contrary, we need a narrow diversity in terms of the features of the intermolecular b-strands to give evidences of a common construction mechanism. The average length of the intermolecular b-strands is 18613 amino acids, length calculated as the sum of the amino acids of the two b-strands. The distribution of the whole chain lengths is broader than that of the b-interface lengths ( Fig. 2A). The intermolecular b-strands have on average 1368 hot spots, 75% have less than 16 hot spots and 25% have between 30 and 77 hot spots. Likewise, there are on average 1268 hot spot pairs per interface, 75% of the interfaces have less than 15 hot spot pairs while 25% have between 25 to 50 (Fig. 2B, inset). The number of hot spot pairs in the intermolecular bstrands is compared to the total number of hot spots pairs over the whole interfaces to assess the diversity of intermolecular b-strands in terms of the number of interactions necessary to build them. The distribution of the number of pairs in intermolecular bstrands is narrower than in the whole interface (Fig. 2B). Globally, 75% of the dataset have intermolecular b-strands sharing features. Moreover, there is no correlation between the length of the whole chain and the length of the intermolecular b-strands (not shown, R = 0.03) supporting the idea that the two objects have independent features.
The dataset contains 568 anti-parallel b-sheets, 132 parallel bsheets and 348 other b-strand arrangements (close packed bstrands) and 60% of the cases have b-strands with distinct sequences. One can already anticipate that the intermolecular bstrands of the dataset cannot be predicted based on a network of pairs of residues following a Parallel In Registered Arrangement (PIRA) because only 12% are parallel b-sheets and most b-strands have non identical sequences. The global features already highlight a network arrangement different from aggregation prone sequences [25].
Analysis of the properties of the residues in interaction in intermolecular b-strands Gemini labels backbone and side chain atoms of the amino acids such that it produces two sub-graphs: one involving pure backbone interatomic interactions (BB networks) and the other involving interactions with at least one atom of the side chain (SC networks). We have shown that this distinction is necessary to exhibit features of intermolecular b-strands [32]. This is certainly related to the involvement of the backbone interactions in the hydrogen bond network of the b-sheets while in a-helices such backbone interactions are involved intramolecularly and are not interfering with intermolecular interactions. This is in good agreement with previous reports that side chain and backbone interactions are involved in hydrophobic and hydrogen bonding, respectively [1,41,42].
First, the properties of the individual hot spots are analyzed. Totals of 704623, 10692 and 5950 amino acids are observed in the whole chains, the SC and the BB hot spots, respectively. These figures give evidences of the reliability of the statistics which improves with the size of the sample. The amino acid frequencies are indicated in table 1 and used to measure the average chemical property, the global (GP) and local propensity (LP) of the amino acids (Tables 2 and 3, respectively). As observed previously the SC and BB hot spots have average chemical properties similar to the amino acids of the whole chains, global propensity and local amino acid distribution coherent with sequences made of b-strands as well as a scattered charge distribution [32]. Namely high b-sheet propensity residues (F, W, Y) are significantly more frequent while low b-sheet propensity (G and A) are significantly less [43][44][45]. The b-strand extremities are enriched in b-breaker amino acids (P and G) while high b-sheet propensity residues are enriched centrally (V, L) [46,47]. The charged residues R, K and E are enriched at the b-strand extremities whereas H and D residues are more frequent centrally when the local preferences of the SC charged residues is considered (Table 4).
Second, the properties of the pair of hot spots in interaction are analyzed. Because most of the intermolecular b-strands are not made of b-strands with an identical sequence, the occurrences n ab and n ba are initially counted but a x 2 test calculated over the occurrences n ab and n ba shows that the differences are insignificant and so n ab and n ba occurrences are summed (Tables 5 and 6). The test ignored the values for the pair of identical residues for which a equals b. There are 10551 SC pairs and 5894 BB pairs, again highlighting the reliability of the statistics. The frequencies of the hot spot pairs f ab are calculated with equation (1) and shown in the tables constituting the figure 3.   Table 3. Global propensities and local preferences of the SC and BB hot spots.

SC hot spots BB hot spots
Amino acid
If the frequency f a is independent of the frequency f b the ratio is equal to one. Overall the hot spots are not matched randomly since 70% and 66% of the BB and SC pairs, respectively, have a ratio that deviates from one. It is therefore necessary to measure the pair frequencies because they cannot be simply derived from the frequencies of individual hot spots.
To evaluate if the distinction between SC and BB hot spots is also relevant at the level of the pairs, the frequencies of the SC pairs are plotted against the frequencies of the BB pairs (Fig. 4). On the diagonal, there are 50 pairs out of a total of 210, thus indicating that 76% of the BB and SC pairs have different frequencies. It is therefore important to investigate them separately. Subsequent analyses are performed using quartiles to take into account the observation that 75% of the intermolecular b-strands share similar global interface features while 25% are more heterogeneous The amino acids with the highest 25% pair frequencies (. quartile Q 3 ) are considered as preferred contacts (Fig. 3, red) whereas those with the lowest 25% pair frequencies (, quartile Q 1 ) are considered as avoided contacts (Fig. 3, green). The neutral contacts have the frequencies between Q 1 and Q 3 (Fig. 3, white). The Q 3 and Q 1 of the SC hot spot pairs are 6.0610 22 and 2.2610 23 , respectively. The Q 3 and Q 1 of the BB hot spot pairs are 6.7610 22 and 1.7610 22 , respectively. In both networks, on average every amino acid pairs with 5 other types of amino acids out of its twenty pairing possibilities. The most preferred contacts are measured as amino acids which pair with a frequency above Q 3 with more than five other types of amino acids. For both networks, the most preferred contacts are I, L, V, S and T similarly to what was found for intermolecular b-strands in dimers [31]. On the other hand compared to the dimers F and Y residues are preferred in the SC networks while A and G are preferred in the BB networks, the residue E is preferred in both. Likewise, the most avoided contacts are measured as amino acids which pair with a frequency below Q 1 with more than five other types of amino acids. For both networks, the most avoided contacts are with C, M, W and H residues, similarly to intermolecular bstrands in dimers. In addition contacts with A, G and Q are avoided in the SC networks while contacts with N and Q residues are avoided in the BB networks.
The features of the hot spots pairs are then analyzed considering the chemistry and the geometry of amino acids (Tables 9 and 10, respectively). Both SC and BB hot spot pairs have similar tendencies for contacts with hydrophobic residues but the contacts with polar and charged residues are twice more frequent in the SC pairs. Even more blatant differences are the contacts between two charged residues, or between two polar residues or else between one charged and one polar residue, at least ten times more frequent in the SC networks. Considering geometrical properties (length of the side chains) the contacts with long and medium residues are significantly more frequent in the SC pairs than in the BB ones which on the contrary favor contacts between short side chain residues.
Third, the number of contacts of the hot spots is counted to determine whether the hot spots have multiple contacts. The BB networks have as many single contact hot spots (2941) as two contact hot spots (2993) but very little three contact hot spots (12). The degree distribution P(k) is equal to the ratio of the number of hot spots with k contacts to the total number of hot spots. For the BB networks, P(k) has a bell-like shape with an average ,k. contacts equals to 1.5 (Fig. 5A). On the other hand, P(k) for the SC networks falls on a straight line when plotted on a linear-log scale indicating an exponential decay, a variation from the power law distribution observed for real networks [48] (Fig. 5A, R 2 = 0.99). The average ,k. contact of the SC hot spots is 1.4.
To determine a prototype intermolecular b-strand network, we use a binomial model with 9 amino acids per strand, 6 hot spots and the probability p = 0.16 of having a contact (see methods for definition of a binomial law). These values are based on the averages of 18 amino acids, 12 hot spots and 10 links per interface measured over the dataset. A fully connected graph of 9 amino acids per strand (all amino acids have at least one link with all others) would have 81 links (9 by 9) and so in total on the dataset 84888 links. Only 13628 links are measured, thus the probability p of making a contact (having a link) is equal to 0.16 (13628/84888). Assuming that the amino acids have a uniform distribution of links (i.e. all amino acids have the same probability of making a link), the binomial model calculates a prototype network with 21% of non-connected amino acids (not hot spots), 36% of amino acids with one contact and 43% of amino acids with more than one contact, 27% of amino acids would have two contacts and 12% would have three. The observed data indicate 49% of amino acids with one contact and 51% amino acids with more than one contact, 33% with two, 14% with three and 4% with more than 3 contacts. The observed data are measured on hot spots only and so do not take into account the non-connected amino acids. In the binomial model, the ''hot spots'', namely the amino acids with a link are 79% (36% with one contact and 43% with more than Table 4. Local preferences of the charged amino acids in the SC hot spots.

Charged
Outer We then looked whether the hot spots had unusual amino acid features according to their number of contacts. The frequency of a hot spot in multiple contacts is divided by its frequency in single contact to measure the amino acid propensity to have multiple contacts. This propensity is plotted against the respective number of atoms of the side chain. No correlation is found for the BB hot spots (not shown, R = 0.41) and only branched residues V, I and L have a higher tendency of making two interactions suggesting that they are enriched in intermolecular b-strands involving parallel bstrands. On the other hand, there is a good linear correlation for the SC hot spots (Fig. 5B, R = 0.8). Thus, the propensity of the SC hot spot to make contacts is proportional to the number of its side chain atoms. Lastly, the probability of having hot spots with more than three contacts (k.3) is plotted against the number of atoms and compared to the probability of having a hot spot with one contact only (Fig. 5C). The probability of having hot spots with more than three contacts increases with the number of atoms whereas the probability of single hot spots distributes around a probability equal to 0.05. This probability (1/20) implies identical chance for all amino acids to have a single contact indicating no amino acid specificity for such contact number. On the other hand, only residues with more than 14 atoms (F, Y, R and W) have a probability above 0.05 to make more than three contacts, with the exception of the residue K.

Discussion
The analysis of the individual hot spot properties confirms a scattered charge distribution on the b-strands, high b-sheet propensity residues enriched centrally and more particularly branched side chain residues (V and L). This indicates that linear information, namely the information read on the sequence of the b-strands, codes essentially for solubility and regulation of the secondary structures.
Discriminating SC and BB interactions is again relevant at the level of the pairs as the SC and BB pair preferences diverge significantly. The ratio of SC and BB hot spots and the ratio of SC and BB pairs are on average around 2, indicating that the SC preferences are likely to have more influence over the intermolecular b-strands. One novel observation is that the pair matching is not only based on the chemistry of the amino acids but also on their geometry as seen in the preferences for long or charged residues in the SC pairs and for small or hydrophobic residues in the BB pairs. There is even enrichment in pairs combining amino acid properties such as pairs between long and charged residues or pairs between long and polar residues. In both SC and BB pairs, the branched residues V, I and L are preponderant contacts. A chemical-centric view for the pair matching is obviously ill-appropriate and in fact the pairing calls upon the versatile properties of amino acids. It might be interesting to explore the role of geometrical parameters on the formation of intermolecular b-strands, experimentally and theoretically. For instance, one theoretical approach would be to use Minimum Steiner trees which offer a purely geometrical description of the amino acids, to determine whether the pair matching yields a minimum energy conformation of the interface [49]. Contacts between identical residues represent only around 10% of the total preferred contacts indicating a minor role in the matching process. This differs from previous report on dimeric intermolecular b-strands and from the prediction by a PIRA model [25,31]. The data show that the 2D information, namely the amino acid pairing is not random and is important for the intermolecular b-strands, not surprisingly since b-strands are not viable without making interactions with another structural element. Now the SC and BB networks do not differ only by their amino acid pairing but also by distinct network features. The BB networks have nodes with single or two contacts probably reflecting the hydrogen bond networks of anti-parallel (single contact) and parallel b-sheets (two contacts), respectively. The BB networks would essentially code for secondary structure interactions. The SC networks follow an exponential decay degree distribution and have nodes with one, two or three contacts but rarely with more than three. Thus the intermolecular b-strands result from the juxtaposition of two networks and the information for making the interface is encoded via a double layer of interactions. One layer is composed of the BB atoms and provides promiscuous interactions, namely low specificity in terms of amino acid composition and interaction motifs. The second layer is composed of the SC atoms which on the contrary provide selective interactions, high specificity in terms of amino acid composition and interaction motifs. Such type of double layer of interactions has been depicted for the interfaces between colicins and their immunity binding proteins as a way to evolve binding affinity [50]. There is also a precedent describing monomeric proteins and intramolecular amino acid interaction networks [51]. One network, based on short range interactions between Ca, had a bell curve degree distribution (random network feature) whilst the other based on long range interactions (side chain atoms) had an exponential decay degree distribution (single-scale network feature).
The exponential decay degree distribution likely fits a network optimized to reduce the number of links, relevantly because it costs to make a chemical link. Moreover, the data shows that above three contacts there is a strong stringency on the choice of the amino acids, suggesting that a node with too many links, a hub, would seriously decrease the sequence plasticity to successfully realize an interface. Intermolecular b-strands are very plastic in term of sequence requirement and seem therefore built to avoid hubs. Hubs are communication devices but also the Achilles' heel of the network: a modification of a hub spreads changes within the whole network because the hubs are connected to many nodes [52]. The propagation of changes upon node modification is called network rewiring [53]. The intermolecular b-strand networks which lack hubs are likely little inclined to rewiring because of Figure 3. Tables of the f ab pair frequencies. A. Observed BB pair frequencies. B. Observed SC pair frequencies. The frequency fab is for pairs of hot spots ab read on the lines a and the columns b. The preferred (.Q 3 ) and avoided (,Q 1 ) pairs are indicated by red and green color, respectively. The pairs with a frequency between Q 1 and Q 3 are not colored. The residues are ordered alphabetically within hydrophobic, charged and polar groups. C. SC and BB pair distinction. The ratios of the frequency of a pair ab in the SC sub-networks to its frequency in the BB sub-networks are indicated. The pairs more frequent in the SC sub-networks are indicated in red (ratio .1.2) and the pairs more frequent in the BB sub-networks are indicated in green (ratio ,0.8). For ratio ranging from 0.8 to 1.2, the pairs are not colored. The abbreviation n.a. stands for ''not applicable'' which is division per zero, those pairs are more represented in the SC sub-networks. doi:10.1371/journal.pone.0094745.g003 Characteristics of Intermolecular b-Strand Networks PLOS ONE | www.plosone.org Table 5. Observed BB Pair occurrences.  Table 6. Observed SC Pair occurrences.   Table 7. Ratio of f ab /(f a .f b ) for the BB hot spot pairs. T  Y   A  2  4  3  2  2  3  1  3  1  2  1  1  3  2  1  2  2  1  1  4   C  4  3  2  3  1  0  0  3  0  1  0  2  1  2  0  1  3  1  3   F  2  2  2  1  3  1  2  2  1  2  3  2  2  2  1  1  3  2   G  1  2  2  1  6  2  2  1  2  2  1  2  2  2  2  3  2   I  2  2  3  1  2  2  2  2  1  2  1  1  1  2  1  2   L  2  2  2  3  2  1  2  2  1  1  1  2  1  2 3  Table 8. Ratio of f ab /(f a .f b ) for the SC hot spot pairs. T  Y   A  2  3  3  1  1  2  3  2  2  4  1  1  2  1  1  2  1  1  2  3   C  2 4  1  2  3  0  1  2  3  0  1  0  1  2  1  0  1  3  0  3   F  3  2  2  3  3  2  2  2  1  2  1  1  1  2  1  1  2  2   G  0  1  1  1  3  1  2  2  1  2  2  3  2  4  2  2  2   I  3  3  3  1  2  1  1  1  1  1  1  1  Characteristics of Intermolecular b-Strand Networks their low interconnectedness. Counterintuitively, the robustness of intermolecular b-strands would appear based on a weak occurrence of links maintaining high sequence plasticity, cutting costs in term of links and reducing their vulnerability to changes (mutation). It is tempting to speculate that a higher number of links is one of the necessary conditions to have a transition from ''functional'' to ''aberrant'' intermolecular b-strands. It is possible that ''healthy'' protein oligomers which become pathological fibers have interfaces with more links per nodes and networks more sensitive to rewiring than those which do not form fibers. To examine such possibility, the tumor suppressor p53 tetramer (PDB 1SAK, fig. 6A), a known case of healthy oligomer undergoing a transition to a fiber is considered. First, the Gemini graph of the WT p53 is generated (Fig. 6B). The greater occurrence of multiple contact residues is striking in the WT p53 network, supporting the hypothesis. The p53 hot spots have on average ,k. = 3 contacts, twice the ,k. value of the intermolecular b-strand networks. The p53 network has 33% hot spots with more than three contacts which is 6 times more than the prototype network. On the other hand, it has 25% of single contact hot spots twice less than the prototype network. Consequently the interconnectedness is larger in the p53 network than in the prototype network.
To look at the sensitivity of the p53 network to single point mutation, the G334V mutant, a familial mutation that leads to the dissociation of the p53 tetramer, misfunctions of the protein and cancer development, is considered [54]. The Gemini graph of G334V is generated and network rewiring is investigated (Fig. 6C). The mutation has a strong global effect on the network as all the residues of the p53 intermolecular b-strands from 324 to 334 have their links modified by the mutation even when they are not directly linked to the residue 334. The modifications are either: (i) vanishing of the links (e.g. D324, G325), (ii) changes of the type of links such as side chain to backbone (e.g. I332, L330), (iii) decrease of the number of contacts (e.g. Q331, T329) or else (iv) changes of contacts (R333). The changes in the network are not limited to residues of the intermolecular b-strands but extend to interactions between residues that belong to a-helices. This definitely shows that there is significant network rewiring in p53 due to a single node modification, the mutation of the residue G334, again supporting the hypothesis. Mutation of other p53 residues such as T329A or Q331A also leads to similar network rewiring (not shown) which therefore cannot explain the capacity of the mutant G334V to form a fiber, because the T329A and Q331A mutants do not make to fiber [54]. The extent of the changes in the network might be such that the intermolecular b-strand interactions are destabilized promoting chain dissociation, the first step to fiber formation.
Conclusion. The key results are: (i) little information is accessible from individual amino acids (i.e. in sequences) and it is the pairs of amino acids that need to be investigated, (ii) the geometry of the amino acid side chains, so far neglected, is a key parameter to understand pair matching and finally (iii) intermolecular b-strands need to be further explored in terms of networks. The intermolecular b-strand networks are rather disconnected networks with no hubs but nodes with few links instead. Such a layout has several advantages as already discussed but probably the most relevant one is the secluding characteristic of the network which may well serve to limit the spread of changes, namely the rewiring, and protect the interface from dissociation upon mutation.

Definitions
Graph. graph, or a network, is a set of many components that interact with each other through pairwise interactions. At a highly abstract level, the components can be reduced to a series of nodes that are connected to each other by links, with each link representing the interactions between two components. The nodes and links together form a network, or, in more formal mathematical language, a graph [55]. The terms nodes and links used in graph theory are amino acids/hot spots and contacts/ interactions, respectively, in the present context. The number of links of a node is the degree k of the node. In the networks of hot spots in interaction, the residues are connected through different motifs. Two residues connected by only one link make a single pair while two residues connected by more than one link make a multiple pair. Hot spots involved in single pair are single contact hot spots. Hot spots with more than one individual contact are called multiple contact hot spots.
Global propensity (GP). The global propensity of an amino acid is the ratio of its frequency in a defined environment by its frequency in a database. Here the global propensity measures the frequency of every amino acid in intermolecular b-strands divided by its frequency in the whole chain.
Local preferences: the local amino acid preferences measure the preferred position of every amino acid on the b-strands. It is calculated as the difference of the frequency of a hot spot at the bstrand extremities (outer position) and its frequency when centrally located (any other position) on the strand.
Chemistry of the side chain of amino acid: charged amino acids are D, E, H, R and K; polar amino acids are N, Q, S, T and Y; hydrophobic residues are A, C, F, G, I, L, M, P, V and W.
Length of the side chain of amino acid: long side chain residues are K, W, R and K; medium side chain residues are D, N, L, I, H, E, Q, M and F and short side chain amino acids are G, A, P, C, S, T and V.

Construction of a non-redundant dataset
The Protein Data Bank (PDB) was first screened at the Research Collaboratory for Structural Bioinformatics (RCSB) for protein oligomers of stoichiometry above 2 and lower or equal to 12 [56]. Above dodecamers the number of cases becomes small for statistical analysis. Dimers are excluded from the dataset because of their diversity of orientation contacts implying broad diversity in recognition contact modes [57]. Viral and membrane proteins have been removed because they are likely to follow a different mechanism of interface formation than soluble oligomers. The coordinates of biological assembly were taken to select for noncrystallographic oligomers. NMR and X-ray structures are taken into account. PDB entries containing only backbone (BB) atoms, or only a few side-chain (SC) atoms, are discarded by monitoring the ratio of available SC and BB atoms for each of the twenty amino acids. Proteins with sequences similar at 90% identity are removed. As a result, 6234 PDBs have been tentatively treated with Gemini to describe the whole interface. There is a small minority of cases where Gemini stops before yielding the interface. Mainly, this is due to the presence of a single subunit in the PDB file, while Gemini expects several. This happens even if biological assemblies were downloaded from the RCSB. At this point, the interface is available for a set of 5248 proteins. Receptor-ligand, enzyme-inhibitor, and antigen-antibody types of interactions involve different ranges of K D than permanent oligomers and as such are expected to have different recognition modes [42]. Therefore they are discarded from the dataset by removing proteins having at least one very short chain (#20 amino acids). Truncated proteins were also discarded from the dataset by selecting only cases having chains less than 20 amino acid different in length.
Using the secondary structure annotation provided in the PDB file, the cases with intermolecular b-strands were extracted according to the following set of rules (to be simultaneously satisfied): 1) at least 3 bonds must be between amino acids belonging to b-strands; 2) at least 2 interface amino acids of each subunit must be in a b-b bond; 3) at least 5 interface amino acids must be classified b. The first rule is actually redundant as it is implied by the second and the third. To simplify the treatment, in the case of hetero-oligomers with more than one intermolecular bstrand, only one, randomly chosen, has been considered. The final list has been screened against redundancies by mapping each PDB code into a UniProt identifier. This allows using the appropriate UniProt algorithms to find and remove redundant cases. After this final suppression, we are left with 755 proteins having 1048 regions of intermolecular b-strands.

Hot spots in interaction
A pair of hot spots is made of a hot spot -a-interacting with a hot spot -b-. Some hot spots participate in more than one pair at the same time and it is necessary to avoid their multiple counting. (L, L)   A pair (A1, A2) is counted 1/n time with n the number of bonds of A1. Let's consider a hot spot G forming a pair with T and another pair with L. Each of the (G, T) and (G, L) pairs is counted a half so the occurrence of G is equal to one and not to two if the pairs (G, T) and (G, L) had been counted one each instead of a half. This counting procedure implies that the tables of occurrences must be read row-wise (Tables 5 and 6). Now, when the number of interactions (bonds) issued from a hot spot is counted instead of the pair occurrences such normalization is unnecessary.
Statistical tools x 2 . n ab and n ba pair occurrences. The total observed pair occurrences n ab and n ba are calculated for each residue as the sum of the occurrences on a row and the sum of the occurrences on a column (Tables 5 and 6 for the BB and SC sub-networks, respectively). The significance of the differences of the occurrences n ab and n ba was assessed using a x 2 (equation 2) with one degree of freedom calculated as follows: With O ij the observed occurrences (line i and column j on the tables 5 and 6) and E ij the expected occurrences calculated as the average value of the total observed pair occurrences n ab and n ba . The sums are for the n ab and the n ba occurrence values. For one degree of freedom, a x 2 value inferior to 3.84 is not significant (5% threshold significance).
Observed (f ab ) and expected values (f a 6f b ). The significance of the differences of the observed (f ab ) and expected pair frequencies (f a 6f b ) was also assessed using a x 2 with O ij and E ij the observed and expected pair frequencies, respectively. This time it is calculated over a matrix where low occurrences (below 5) are summed and a p-value is calculated.
Binomial law. This law calculates the probability of making a link P(k) over a large number of test n with p the probability to make a link and (1-p) the probability to make no link (equation 3). Thus the probability of any SC hot spot to make k links (i.e. k number of contacts) is calculated as the product of the probability for any node to make k links by its probability to make no link over n trials. When the calculated values are close to the observed values, the binomial law is a good model for estimating the number of links of the hot spots.
Virtual mutation Fold X is used to generate the virtual mutation G334V in the PDB of the p53 tetramerization domain was designed following instruction in [58,59].

Availability of supporting data
The list of the 755 PDB cases and their respective intermolecular b-strands are available on request.