Molecular Models of STAT5A Tetramers Complexed to DNA Predict Relative Genome-Wide Frequencies of the Spacing between the Two Dimer Binding Motifs of the Tetramer Binding Sites

STAT proteins bind DNA as dimers and tetramers to control cellular development, differentiation, survival, and expansion. The tetramer binding sites are comprised of two dimer-binding sites repeated in tandem. The genome-wide distribution of the spacings between the dimer binding sites shows a distinctive, non-random pattern. Here, we report on estimating the feasibility of building possible molecular models of STAT5A tetramers bound to a DNA double helix with all possible spacings between the dimer binding sites. We found that the calculated feasibility estimates correlated well with the experimentally measured frequency of tetramer-binding sites. This suggests that the feasibility of forming the tetramer complex was a major factor in the evolution of this DNA sequence variation.


Introduction
STAT (Signal Transducer and Activator of Transcription) proteins are activated by interferons, cytokines, and growth factors as a critical cytoplasmic to nuclear signaling mechanism [1]. Classically, STAT proteins were shown to be activated by tyrosine phosphorylation, allowing their dimerization and nuclear translocation [2]. Subsequently, N-terminal domain (NTD) mediated dimerization of dimers to form tetramers was also demonstrated [3]. We previously showed that STAT5 tetramerization is critical for normal cytokine responses and immune function [4]. Moreover, STAT5 tetramers, but not dimers, were reported to be associated with leukemia development in mice [5]. Tetramerization of STAT1 is also vital for normal immune function [6]. Many specific genes have been shown to be activated by STAT protein tetramers, including those encoding IL-2 receptor α [7], interferon-γ [8], α2-macroglobulin [9,10], and

Results
STAT proteins contain six domains [14][15][16] and connecting peptides (Fig 1). The "core" of the molecule is made of the four middle domains, lacking the N-terminal (NTD) and the transactivation (TAD) domains and all but three residues of the phospho-tyrosine containing segment (PTS). There is a 13-residue flexible linker between the NTD and the core. Tetramer models of the DNA-bound phosphorylated core at a particular DNA spacing were built before for STAT5A [12] and STAT1 [15]. We built tetramer models of STAT5A with all spacings between the dimer binding sites and included considerations of the N-terminal dimer formation. A flow chart of our model building process is given in Fig 2. We first built a DNA-bound, phosphorylated core dimer model using the crystal structures of STAT5A in its un-phosphorylated form (1y1u) [16] and of STAT3β in its phosphorylated dimer form interacting with DNA (1bg1) [14]. The core dimer was then duplicated and the duplicated dimer rotated and slid, one base pair at a time, along an ideal B-DNA with 10.5 bps per turn (see S1 Movie). The N-terminal domain dimer (NDD) that forms between two core dimers was built separately by homology modeling using the STAT4 NDD structure (1bgf) as a template. For each core dimer pair with particular DNA spacing, the NDD was then placed at many different locations and orientations and the feasibility was assessed as to whether the NDD could be connected to the two core dimers by means of the 13-residue linker on each side of the NDD.
We labeled the two monomers of one core dimer as A and D and the corresponding monomers in the duplicated dimer as A' and D'. As the A'-D' core dimer is slid along the DNA away from the A-D core dimer, it also rotates around the DNA axis and the distance between a pair of N-termini of moving and stationary dimers oscillates while it generally increases (Fig 3). Two core dimers that are 9 or less CTCDs apart had clashes between them. Also, two dimers could not be connected by an N-terminal domain dimer (NDD) when the CTCD was greater than 39 (Fig 3 and Methods). Therefore, we built models with CTCD values from 10 to 39.
There are two topologically distinct tetramers depending on which two monomers are connected by an NDD. All core tetramer models have a 2-fold symmetry axis, which runs perpendicular to the DNA axis and relates monomers A to D' and D to A' (Fig 4). An NDD can connect the symmetry-related monomers, which come near each other on the same side of DNA when the two core dimers are rotated approximately 180°from one another ('staggered' tetramer), at CTCD values of~15 or 25 (green and blue lines in Fig 3; Fig 4A and 4B; and S2 Movie). Since an NDD is itself 2-fold symmetric, we placed these NDDs on the symmetry axis, one on each side of DNA (on-axis NDDs). Another type of tetramer results when an NDD connects monomers A and A', in which case the symmetry-related NDD can connect D and D'. These monomers come near each other when the two core dimers are related essentially by translation along the DNA axis with no or relatively small rotation ('eclipsed' tetramer) at CTCD values of~10, 20 or 30 (red line in Fig 3; Fig 4C and 4D; and S3 Movie). The NDDs in this case are away from the symmetry axis (off-axis NDDs).
For both types of tetramers, the separately built NDD was placed at many discrete points and orientations, and the probability of connecting each to a pair of core monomers was assessed (see Methods). The relative feasibility, F 2 , of forming a tetramer using one or two NDDs was calculated as the sum of these probabilities. We note that when only one NDD connects the two dimers, there are two other NTDs per tetramer, which are free and can be used to form a higher order oligomer; however, we do not consider higher oligomers in this paper. The relative feasibility, F 4 , of forming a tetramer using two NDDs, which may be required for the stable cooperative tetramer formation on DNA, is the product of the sums of the probabilities with NDDs on each side of DNA.
The computed feasibility measures are given in Table 1 and shown in Fig 5A (for F 4 ) and Fig 5B (F 2 ). Since NDDs at on-axis positions are constrained to lie on the symmetry axis . We treated the DBD and LD as one combined domain, as SCOP [17] does for STAT3β. There is a short flexible 13-residue linker between the NTD and CCD, which partly overlaps with the NTD. There is also a phospho-tyrosine containing segment (PTS) between the SH2 and TAD, which we combined with the TAD. See Methods for more detail. (B) The sequence of mouse STAT5A (NCBI Accn# CAA88419.1). The residues are highlighted according to the domains to which they belong, using the same coloring scheme used in (A). The 13-residue linker is boxed. The phospho-tyrosine residue (Y694) and the two following residues (V695 and K696) are shown in red. (C) STAT5A domains in the modeled structure of the STAT5A core, which includes the CCD, DBD, LD, SH2 and the residues 694-696 of the PTS. Dotted line represents the connecting residues (685-693) that were not included in the model.
whereas off-axis positions have no such constraint, there are many more possible positions for off-axis NDDs, which increases the feasibility values. Therefore, we scaled the on-and off-axis feasibility measures separately so that the sum best matched the experimental frequencies (see Methods). Both Fig 5A and 5B show five peaks, at CTCDs of 10-11, 15, 20-21, 25-26 and 31-32, similar to those seen in the plot of the experimentally measured frequencies (thin blue lines in Fig 5A and 5B). These peaks represent both 'eclipsed' (red peaks at CTCDs of around 10, 20, and 31) and 'staggered' (blue peaks at CTCDs of around 15 and 26) tetramers. The troughs are at or near the valleys in the experimental frequency distribution (e.g., at or near CTCD values of 13, 18, 23, and 28). The inter-dimer rotation angles in these tetramers are approximately ±90°. Note also that the computed feasibility measure is zero or close to zero for CTCD values  larger than 32. This is close to the CTCD value of 31, beyond which the experimentally observed frequency of tetramer binding sites is also at the background level.
There are some obvious differences. For example, the calculated values are zero or nearly zero at CTCD values around 13, 18, 23, and 29, where the experimental frequencies appear to be significant. Also, as noted above, there are clashes between the core dimers when the CTCD value is 9 (gap length 0), but the experimental data (Fig 4D of ref. [4]) show a small, non-zero frequency at this spacing. The main causes for these discrepancies are presumably the facts that our models are rigid and that the DNA maintains the ideal geometry. There are clashes between core dimers even at CTCDs of 10-12, but we ignored them because they are small in number (<10) and appear to be avoidable by local alterations of the conformation of flexible loops and/or by small bending and twisting of the DNA. To see the effect of the twisting of DNA, we performed the same computations using a DNA with 10.0 bps per turn instead of 10.5 bps ( Table 2). The calculated feasibility measures were similar for the two DNAs, but the peaks and troughs generally were shifted to the left with 10.0 bp DNA (Fig 6), so that the fit to the experimental frequency data was generally poorer than with 10.5 bp DNA ( Fig 5). However, the position of the trough at a CTCD of 28 was nearly reproduced with 10.0 bp DNA. Thus, while the main features of the gap length dependence of the tetramer binding site frequencies can be understood without considering DNA and protein flexibility, such flexibility may have contributed to shape the finer features of the frequency distribution.

Discussion
Figs 5 and 6 clearly show that the gap length dependence of the calculated feasibility measures shares critical features with that of the experimentally measured frequency of tetramer binding sites in the mouse genome, including the number and position of the five peaks and four valleys. Our models and calculations, therefore, suggest that the peaks at or near the CTCD values of 10, 20 and 30 correspond to eclipsed tetramers connected by one or two off-axis NDDs while the peaks at or near CTCD values of 15 and 26 correspond to staggered tetramers connected by on-axis NDD(s).  In general, the frequency with which a particular spacing occurs in a genome is an evolutionary feature that needs not be related to the ease with which a tetramer can be formed. For example, the frequency of a spacing could be random or dictated by the biological usefulness of the target genes so long as a tetramer can be formed at that spacing. However, our study indicates that the frequency is in fact dictated mainly by the feasibility of forming a tetramer.
This point is most clear from the low but non-zero frequency of the spacing at which the calculated feasibility measure is zero or nearly zero, e.g. at the CTCD value of 18 or 22. The non-zero frequency of tetramer binding sites with these spacings indicates that a tetramer complex can be formed on DNA sites with one of these spacings, presumably because of the flexibility of DNA and of surface loops of the protein. On the other hand, the fact that the frequency is low at these spacings indicates that the frequency is not random but related to the feasibility of forming the tetramer complex. We note here that our feasibility measure is essentially a weighted number of different positions and orientations of the N-terminal domain dimer (NDD) at which it can connect a pair of protein core dimers through the two 13-mer peptide linkers. It is not a full measure of the stability of the tetramer complex because it does not include the energy of protein-DNA interaction, calculation of which is computationally demanding. We note also that some of the binding sites with significant F2 values may involve higher order oligomerization, which we did not consider in this study.
On the basis of the observation that mutations on the surface of NDD affect the stability of the tetramer complex, Staab et al. [18] recently suggested a model of the tetramer complex in which the two core dimers are arranged in an eclipsed state and connected by one on-axis NDD sitting close to the DNA. Our calculations show that a model of this type is possible (although possible positions of NDD are not in the 'front' but on the 'backside' of DNA in between the SH2 domains, see Fig 4C and 4D) but is associated with a low feasibility measure. At the CTCD of 21, for example, when the two core dimers are maximally eclipsed for the 10.5 bps/turn DNA, the F2 value (Table 1) for the on-axis NDD is non-zero but small; in most positions, the NDD either clashes with some parts of the core or is too far from the N-termini of the core coiled coil domains for the 13-mer peptide linker to connect (see S1 Tables). It is possible that this model, with NDD at a few barely possible positions, is stabilized energetically by an interaction with DNA, which our calculations do not include (see above). It is interesting that the possibility of this model is not needed in order to produce a high feasibility at the CTCD of 21 because there are many off-axis NDD positions that can connect a pair of core dimers at this separation.
The reason that the number of tetramer binding sites in a genome should correlate with the feasibility of forming a tetramer is not entirely clear. We are not aware of any previous work reporting such a correlation between a feature of DNA sequence evolution and a measure of ease with which a protein-DNA complex can be formed. It is possible that there are many more weaker tetramer binding sites on the genome and that the observed frequency represents only those sites that have appreciable binding at the physiological concentration of STAT.
By estimating a measure of the feasibility of the tetramer formation, our model can therefore predict the number of such sites in the genome relative to those with other spacings. Given the high homology of human vs. mouse STAT5A as well as the homology of STAT5A to other STAT proteins, including particularly STAT5B but also STAT3, the models we have built here are applicable to other STAT proteins as well.

STAT5A sequence and domain definitions
The Mus musculus STAT5A protein sequence (NCBI Accn# CAA88419.1, http://www.ncbi. nlm.nih.gov/) used in this study and its domain boundaries are shown in Fig 1. The N-terminal domain (NTD, residues 1-127, highlighted cyan) corresponds to the visible residues in the STAT4 NTD structure [19]. The domain boundaries for the coiled-coil domain (CCD, 138-331, yellow), DNA binding domain and linker domain (DBD+LD, 332-594, green), and SH2 domain (595-684, magenta) correspond to the four domains of STAT3β structure according to the alignment between STAT3β and STAT5A [14]. The transactivation domain (TAD, 712-793, un-highlighted) and the linker between SH2 and TAD, which we call the phospho-tyrosine containing segment (PTS, 685-711, un-highlighted), also correspond to those defined for STAT3 [14]. The domain boundaries are similar according to the STAT1 and STAT5A structures [15,16], except that the SH2 domain in the STAT5A structure was defined to include the phospho-tyrosine containing segment.
Most of the residues of the PTS are not visible in the crystal structure of STAT5A [16] and the solution conformation of the few visible residues (residues 685-690) is uncertain because the corresponding residues are missing in the structure of STAT3β [14]. Presumably, this region of the sequence assumes flexible structures. It is poorly conserved among different STAT proteins except for three residues including the phosphorylated tyrosine residue 694 [14,17,20]. No crystal structure is available for the TAD, although NMR structures of this domain from STAT1 and STAT2 proteins in complex with TAZ1 and TAZ2 domains of CREB-binding protein (CBP) are available [21]. The TAD sequences are not similar among different STAT proteins including STAT5A [20] and are probably intrinsically unstructured in the absence of the cognate co-factors. We therefore excluded both PTS and TAD in our models except for residues 694-696 (YVK in red font in Fig 1B), which include the phospho-tyrosine and two additional residues.

Construction of STAT5A core dimer models
Crystal structures for STAT5A in its un-phosphorylated form (1y1u) [16] and for STAT3β in its phosphorylated dimer form interacting with DNA (1bg1) [14] are available. Since STAT5A is homologous to STAT3β (32% identical by NCBI BLAST pairwise alignment), we built a model of phosphorylated, DNA bound STAT5A core dimer by using the STAT3β dimer-DNA structure (1bg1) and superimposing and separately replacing each of the core domains, CCD, DBD+LD, and SH2, by the corresponding domains in the structure of the un-phosphorylated STAT5A (1y1u). The resulting structure was not refined in any way, as we did not want to potentially deviate from the observed phosphorylated dimer-DNA complex structure. Residues 694-696 in the PTS of the STAT3β structure (1bg1) were retained in the core dimer model of STAT5A. The structure of the core dimer of STAT1 [15] is very similar to that of STAT3β, but we used STAT3β because of its greater sequence similarity to STAT5A.

Construction of STAT5A core tetramer models
The 3D-DART webserver (http://haddock.chem.uu.nl/dna/dna.php) was used to build an ideal 60-mer B-DNA with 10.5 or 10.0 bps per turn. Two copies of the DNA-bound core STAT5A dimer, built as above, were placed on this DNA by superimposing the two 18-mer DNAs of the dimers on the 60-mer DNA with a desired base pair separation between them. The 18-mer DNAs were then removed.

Construction of the N-terminal domain dimer (NDD)
The structure of the NDD is available for STAT4 (1bgf) [19]. Since the NTD of STAT5A is homologous to that of STAT4 (NCBI BLAST pairwise blast gives 34% identity), we built the NDD of STAT5A by homology modeling from the STAT4 NDD structure (1bgf) using swisspdb web server (http://swissmodel.expasy.org). The dimer interface used to build the dimer was the reinterpreted dimerization interface by Chen et al. [22].
An upper limit of the distance between the core dimers An upper limit of the distance between the core dimers is reached when the two core dimers are so far apart that their N-terminal domains (NTDs) cannot come close enough to dimerize. We determined this upper limit using the condition where Dnn is the distance between the N-terminal Cα atoms of the two CCDs that one potential NDD connects, Dcc is the distance between the Cα atoms of the two C-terminal residues (Asn 124) of the NDD, which is 55.8 Å in the model we built, and Dmax is the maximum possible end-to-end distance of the 13-mer peptide linker between NTD and CCD of a STAT5A monomer, which we took to be 40 Å (see below). Therefore, a condition for the upper limit is Dnn < 136 Å.

Distance between two N-terminal Cα atoms of two dimers of a core tetramer
There are four possible core monomer pairs that may be linked by two N-terminal domains inbetween: A-A', A-D', D-A', and D-D'. Two of these (A-A' and D-D') are equivalent by 2-fold symmetry. Assuming that the DNA does not bend and the core dimers are rigid, the distances, Dnn, between the N-terminal Cα atoms of each of these pairs can be calculated using the following formulas: where letters in parenthesis after a distance indicate the core monomer pair, n is equal to the CTCD, z and α are the rise and the rotation angle per bp of the DNA double helix, and dz(AD), r, and β are geometrical properties of a core dimer as described below. For the B-DNA with 10.5 bps per turn that we used, z = 3.37 Å and α = 34.3°. Let the cylindrical coordinates of the N-terminal Cα atoms of the monomers A and D be (z, r, φ) with subscripts A and D respectively, with the z-axis along the DNA axis. Then dz(AD) = zD-zA, r = rA = rD, and β = [180°-(φD-φA)]/2. For the core dimer model we built, dz(AD) = 13.64 Å, r = 49.79 Å, and β = 0.4°. distances become larger than the upper limit of 136 Å, and therefore an eclipsed tetramer with an off-axis NDD will not form, when the CTCD value is larger than 34. The A-D' distance is beyond the limit when the CTCD value is larger than 29 whereas the D-A' distance reaches the limit at the CTCD value of 39. The experimental data show that the number of tetramer binding sites detected is very low when the CTCD value is larger than 31 (gap length of 22 in Fig 4D of ref. [4]), which suggests that tetramer formation with only one NDD connection is rare, although the little blips in frequency at CTCDs of 36 and 37 may be an indication that a tetramer with only one NDD, presumably on-axis connecting D and A' monomers, can form.

End-to-end distance distribution of the linker between the NTD and CCD
There are ten residues (residues 128-137: SPAGVLVDAM) between the N-terminal and coiled-coil domains that are not visible in the STAT5A crystal structure [16]. We assume that this peptide fragment is flexible. The fact that the corresponding part is missing in the full STAT1 structure (1yvl) is consistent with such flexibility. The residues flanking these linker residues on the N-terminal side are the C-terminal residues of the NTD, the last three of which appear to be flexible in the NTD structure of STAT4, being outside of the last helix. We included these three residues as a part of the linker, so that the length of the linker was increased to 13 residues (residues 125 to 137 shown boxed in Fig 1B: NCSSPAGVLVDAM).
We correspondingly shortened the NTD by three residues at the C-terminal end; whenever we refer to the C-terminal end of the NTD or the NDD, we refer to this shortened end.
Instead of constructing possible structures for the 13-residue linker, we computed the endto-end distance distribution of all 13-residue peptides in a subset of all the protein structures in the Protein Data Bank (PDB), which consisted of the structures with resolution < 2.5 Å, Rfactor < 1.0 that were deposited before 03/23/2012 (19,412 chains in total). Each 13-residue peptide was aligned to the 13-residue linker sequence and the Blosum62 score [23] was computed for the alignment. The histograms of the end-to-end distances of the 13-residue peptides with high Blosum62 scores are shown in Fig 7. The end-to-end distances range between 4 and 40Å, with a peak at~19Å. The overall distribution was rather insensitive to the particular Blo-sum62 score cutoff value. The histogram of the peptides with Blosum62 score greater than 8 was converted to the probabilities of the end-to-end distances in 1Å bins. These probabilities were used as the probability of linking the NTD to the CCD with the given end-to-end distance.
Calculation of feasibility measures for tetramer formation with on-axis NDDs (Fig 4A and 4B) An NDD was placed so that its 2-fold axis coincided with the 2-fold axis of the core tetramer-DNA complex. The NDD was then moved along the 2-fold axis on both sides of DNA at 10Å intervals. At each position, the NDD was flipped 180°around the vertical (perpendicular to the 2-fold) axis and both the flipped and un-flipped NDDs were rotated about the 2-fold axis at 30°intervals. At each position and orientation of an NDD, three numbers were calculated: the number of clashes between the NDD and the core tetramer and the two distances from the Cterminus of a monomer of the NDD to the N-termini of the two core monomers that are on the same side of DNA as the NDD. A "clash" is defined as an instance when a Cα to Cα non- bonded distance is less than 5Å or when a Cα atom of an NDD comes within van der Waals contact with a DNA atom, as defined by Chimera. The number of clashes and the two distances are reported in S1 Tables and S3 Tables for models with 10.5 and 10.0 bps/turn DNA, respectively. If a clash occurred, the probability p(x, ϕ) that a tetramer will form with an NDD at position x and orientation ϕ was set to zero. Otherwise, the probability was calculated as p(x, ϕ) = p 1 2 + p 2 2 , where p 1 and p 2 are the probabilities that a 13-mer peptide will have the end-to-end distance equal to the two distances calculated, respectively. Note that p 1 and p 2 denote probabilities of different ways of linking an NTD to the core dimer and that each needs to be squared because two symmetric connections are made in each case. These probabilities are also reported in the S1 Tables and S3 Tables. We take the sum as a relative measure of the feasibility of forming a tetramer linked by at least one on-axis NDD, which connects two core monomers. Here, the first summation is over all positions and orientations of an NDD on one side of DNA, connecting the core monomers A and D', and the second summation is for positions of an NDD on the other side of DNA, connecting D and A'. We take the product as the relative feasibility measure for forming a tetramer with two connections involving both NDDs and all four core monomers. The computed values of the probability sums and of F 2 and F 4 are given in the "on-axis" columns of Table 3 for the 10.5 bps/turn DNA and of Table 4 for the 10.0 bps/turn DNA.
Calculation of feasibility measures for tetramer formation with off-axis NDDs (Fig 4C and 4D) A lattice of grid points was set up using a cylindrical coordinate system. The axis of the cylinder was chosen as the line connecting the N-terminal ends of the monomers D and D'. The grid points had the coordinates (z, r, ϕ) where z, r, and ϕ are the distance along, radial distance from, and the azimuthal angle around the cylinder axis, respectively. The distances z and r were varied in 10 Å intervals and ϕ was varied differently depending on the value of r in such a manner that the grid points were separated roughly by 10 Å in all directions. At each grid point, the NDD was placed with the NDD line, defined as the line joining the two C-termini of the NDD monomers, parallel to the z-axis and then rotated about the NDD line at an interval of 60°. At each position and orientation of the NDD, again three numbers were calculated: (1) the number of clashes between the NDD and the core tetramer, (2) the distance from the N-terminus of monomer D to the nearer of the two C-termini of the NDD, and (3) the distance between the N-terminus of D' to the other C-terminus of the NDD. These data are reported in S2 Tables and S4 Tables for models with 10.5 and 10.0 bps/turn DNA, respectively. If a clash occurred, the probability p(z, r, ϕ, θ) that a tetramer will form with an NDD at position (z, r, ϕ) and orientation θ was set to zero. Otherwise, the probability was calculated for the position and orientation as p(z, r, ϕ, θ) = p 1 . p 2 , where p 1 and p 2 are the probabilities that a 13-mer peptide will have the end-to-end distance equal to the two distances calculated, respectively. These probabilities are also reported in the S2 Tables and  S4 Tables. We then take the sum form as the measure of the feasibility of forming a tetramer with at least one off-axis NDD, which connects two core monomers. Here, the summation is over all positions and orientations of an NDD connecting monomers D and D'. The probability is doubled because for every D-D' connection, a symmetry-related A-A' connection is also possible. We take the product form as the feasibility measure for forming a tetramer with two connections involving both NDDs.
The computed values of the probability sum and of F 2 and F 4 are given in the "off-axis" columns of Table 3 for the 10.5 bp/turn DNA and of Table 4 for the 10.0 bp/turn DNA.

Scaling on-and off-axis feasibility measures
The feasibility of forming a tetramer with a particular CTCD value is the weighted sum of those with NDD at the on-and off-axis positions. The weights were determined so as to minimize the difference between the weighted sum and the experimentally measured number of tetramer binding sites. Thus, we determined the weights w j , where j = 1 or 2 for the on-or off-axis feasibilities, respectively, which will minimize S = S I (c i −n i ) 2 , where c i = S j w j F ji is the calculated feasibility measure for on-(j = 1) or off-(j = 2) axis position for the CTCD value i, and n i is the experimentally measured number of tetramer binding sites at the CTCD value i. The solution of the minimization problem is: w = a -1 b, a jk = S i F ji F ki , and b j = S i F ji n i . The computed w values are 690.9 and 307.4, respectively, for the on-and off-axis F 2 and 89064. 8 Table L occupies columns AH-AN and rows 44-62. Table M occupies columns AH-AN and rows 66-83. An entry in these tables is 1 or 0 if the corresponding entry in tables F and G is 0 or non-zero, respectively. Tables N, O, P, and Q give p(x, ϕ) values described in the main text.    Table B corresponding to the distances dist1 and dist2, respectively. Column J: Column G converted to 1 when there is no clash or 0 otherwise. Column K: Product of columns H, I and J. These are the p(z, r, ϕ, θ) values described in the main text. The last row in column K gives the sum of all entries of column K. This is the probability (sum of p(z, r, ϕ, θ)) that an off-axis NDD will connect a pair of nonsymmetry related monomers.  Fig 7). The same table is given on each sheet of the Excel file for the off-axis NDD. Column O: Endto-end distance of the peptide, in Å truncated to an integer. Column P: Number of 13-mer peptides in PDB, with the BLOSUM62 score of 8 or greater when compared to the sequence of the 13-mer linker peptide in STAT5A, with the end-to-end distance in the given bin.