Solution and Crystallographic Structures of the Central Region of the Phosphoprotein from Human Metapneumovirus

Human metapneumovirus (HMPV) of the family Paramyxoviridae is a major cause of respiratory illness worldwide. Phosphoproteins (P) from Paramyxoviridae are essential co-factors of the viral RNA polymerase that form tetramers and possess long intrinsically disordered regions (IDRs). We located the central region of HMPV P (Pced) which is involved in tetramerization using disorder analysis and modeled its 3D structure ab initio using Rosetta fold-and-dock. We characterized the solution-structure of Pced using small angle X-ray scattering (SAXS) and carried out direct fitting to the scattering data to filter out incorrect models. Molecular dynamics simulations (MDS) and ensemble optimization were employed to select correct models and capture the dynamic character of Pced. Our analysis revealed that oligomerization involves a compact central core located between residues 169-194 (Pcore), that is surrounded by flexible regions with α-helical propensity. We crystallized this fragment and solved its structure at 3.1 Å resolution by molecular replacement, using the folded core from our SAXS-validated ab initio model. The RMSD between modeled and experimental tetramers is as low as 0.9 Å, demonstrating the accuracy of the approach. A comparison of the structure of HMPV P to existing mononegavirales Pced structures suggests that Pced evolved under weak selective pressure. Finally, we discuss the advantages of using SAXS in combination with ab initio modeling and MDS to solve the structure of small, homo-oligomeric protein complexes.


Introduction
Human metapneumovirus (HMPV) is a major cause of acute respiratory diseases in children, the elderly and immunocompromised patients worldwide [1][2][3][4][5]. HMPV belongs to the Pneumovirinae subfamily of the Paramyxoviridae and is further classified into the genus Metapneumovirus [6]. HMPV is an enveloped virus that forms pleomorphic or filamentous virions. Its genome consists of a ~13-kb single stranded RNA molecule of negative polarity that encodes 9 proteins in the order 3 '-N-P-M-F-M2(-1)/(-2)-SH-G-L-5 '. HMPV proteins show detectable levels of sequence identity to the respiratory syncytial virus (RSV) (genus Pneumovirus); however, the order of the genes is different and HMPV lacks the NS1 and NS2 genes present in RSV. For all paramyxoviruses, the nucleoprotein (N) encapsidates viral RNA, leading to a N-RNA complex which, together with the RNA-dependent RNA polymerase (L) and the phosphoprotein (P), forms the viral replication complex. The P protein is thought to be responsible for the recruitment of the large polymerase L onto the viral N-RNA template through direct interactions with N and L [7][8][9][10][11][12][13][14][15]. In addition, P chaperones the nascent N, which is sequestered in the form of an RNA-free NP complex [16,17]. The M2 gene is specific to the Pneumovirinae subfamily, and possesses two overlapping open reading frames encoding two proteins, the antitermination/transcription-elongation factor M2-1, which is required for viral transcription [18], and the RNA synthesis regulatory factor M2-2 [19].
For all members of the Paramyxoviridae family, the P protein is an intrinsically disordered polypeptide which forms tetramers through a central α-helical coiled-coil region. Available structures of the tetrameric coiled-coil from Sendai virus (SeV) [20] and Measles virus (MeV) [21] show long parallel arrangements of twisted α-helices. However, the structure of the Mumps virus phosphoprotein strikingly reveals the formation of parallel dimers that further assemble into tetramers by associating in an antiparallel fashion [22]. In contrast, the tetramerization domain of the RSV P protein, which is the closest homologue of HMPV to have been structurally characterized, displays a much shorter coiled-coil region, termed fragment Y*. This fragment has been previously identified via proteolytic digestion and consecutively mapped to residues 119 to 160 by mass-spectrometry and N-terminal sequencing [23,24]. Interestingly, although the length of the HMPV P sequence is greater than that of RSV P by 53 residues and the overall sequence identity is only 28%, conservation is considerably higher in the central structured region of the protein, suggesting similar tetramerization domains.
In this study, we applied bioinformatics approaches to locate the central folded region of HMPV P, and used symmetric homo-oligomeric ab initio modeling in combination with small angle X-ray scattering (SAXS) and molecular dynamics simulations (MDS) to determine the structure of the central region of HMPV P (P ced ) and capture its flexibility in solution.
We used the obtained model to solve the crystal structure of the core region of P ced (residues 168-194) by molecular replacement. We analyze the implications of the structure of P ced for virus function and evolution, and discuss the usefulness of integrative approaches to protein structure determination.

Disorder analysis locates the central structured region of P ced
We used meta-disorder predictions in combination with sequence conservation and secondary structure propensity to locate IDRs and folded regions of HMPV P ( Figure 1). The analysis predicts the presence of a central, highly conserved region with α-helical propensity located between residues 158 to 237, which we refer to as P ced . The N-terminal and Cterminal regions flanking P ced are mostly disordered and weakly conserved, with the notable exception of the first 30 residues, which show a narrow peak of conservation and predicted order, suggesting the presence of an α-helical molecular recognition element (MoRE), as has been described for P proteins from other members of the Paramyxoviridae [16] and Rhabdoviridae families [25].

Structural characterization of P ced by SAXS
P ced was expressed and purified in E.coli and its structure was characterized using SAXS (Figure 2 A). The samples were free from aggregates, as evidenced by the linearity of the Guinier region (Figure 2 B). The parameters derived from SAXS data are summarized in Table 1. Radii of gyration (R g ) were independent of protein concentration, and only moderately affected by salt concentration (R g =3.26 ± 0.02 nm in 150 mM NaCl vs 3.17 ± 0.04 nm in 800 mM NaCl). However, a significant drop in R g to a value of 2.98 ± 0.03nm was observed in the presence of 1M of non-detergent sulfobetaine 201 (NDSB-201), suggesting an induced stabilization of P ced structure. Molecular weights (MW) were estimated based on calculation of the concentration-independent volume of correlation V c , as defined in [26], yielding values between 28 and 34 kDa, in agreement with the MW calculated from the amino-acid sequence, assuming a tetramer (8.8 x 4 = 35 kDa).

Modelling of P ced using Rosetta Fold-and-dock and SAXS-based model selection
We employed the Rosetta fold-and-dock application [27] to model the structure of P ced tetramers. We generated 2 x 30,000 models using the sequence of residues 155-241 or 156-237. The use of two different sequence lengths leads to an increased structural diversity of sampled models, due to the fragment-based approach implemented in Rosetta. Moreover, the effects of possibly truncating the predicted α-helix (α3) located between residues 207-241 are taken into account ( Figure 1). Both ensembles were ranked according to their Rosetta free energy score, and the five best models of each ensemble are shown in Figure 3 A and B. Interestingly, all models apart from a single oblate model, form a tetrameric coiled-coil through the arrangement of α-helices (α2) typically ranging from residues 168 to 198. We refer to this central region as P core . The regions comprising residues 158 to 168, and 200 to 237 display a tendency towards α-helical conformations, but adopt different orientations in each model, resulting in important changes to the overall shape of the predicted models. This lack of convergence suggests that these regions might not assume a single conformation in solution, but rather exist as disordered ensembles.
In a second step, we calculated theoretical SAXS profiles for all of the Rosetta ensembles and fitted them to the experimental SAXS profile, yielding an additional score for each model in the form of a χ exp value, which measures the discrepancy between theoretical and experimental SAXS profiles. The best results were obtained using data measured in the presence of 1M NDSB-201, because of the increase in conformational stability it induces, as evidenced by the significant drop in R g (Table 1). We filtered out all models displaying χ exp values higher than 1.3, thus eliminating more than 90 % of the models and then ranked the remaining models according to their Rosetta score. The five best models from each ensemble are shown in Figure 3 C & D. Strikingly, we observe that the experimental SAXS profile imposes strong shape constraints on the models, leading to a more homogeneous ensemble ( Figure 3C&D opposed to Figure  3A&B). With the exception of an oblate model which ranked third (not shown) and model 4,all models display a relatively similar coiled-coil arrangement of α-helices encompassing residues 168 to 198 (P core ), while residues 158-167 and residues 199-237 adopt various, mostly α-helical structures.

MDS confirm the stability of P core and the flexibility of the flanking regions
We tested the stability of the 10 best-fitting ab initio models via classical explicit-solvent molecular dynamics simulations (MDS), by performing duplicate runs of approximately 200 ns. The analysis of root mean square fluctuations (RMSF) along the sequence confirms the stability of the coiled coil region, with RMSF values centred around 2 Å for the P core region. The flanking regions (residues 158-168 and 200-237) display higher RMSFs, indicating instability of the packing in most models (Figure 4 F), and further supporting the hypothesis that these residues are flexible in solution. Additionally, model 4, which displays a different P core structure, is readily identified as an outlier due to its higher flexibility in the P core region. The RMSD of P core was calculated with respect to the starting structure, showing that model 1L displays the most stable P core structure (Figure 4 A). P core from model 1L was then used as a reference structure to study the conformational behavior of the other models during MDS (Figure 4   The predicted propensity to adopt ordered structures is represented along the amino-acid sequence (black line), together with the conservation score (red line), calculated using AL2CO [76]. The location of the predicted secondary structure elements and the identity of the cloned construct are shown above the graphs.

SAXS-based ensemble analysis captures the dynamic character of P ced in solution
The time-averaged R g of all simulated models ranges from 2.70 to 2.98 nm (Table 2),which is lower than the measured R g for P ced (3.26 nm) by at least 3 Å, clearly indicating that P ced possesses IDRs that are not adequately modeled in our classical MDS. Interestingly, data measured in the presence of 1M NDSB-201 shows a R g of 2.98 nm, suggesting a more stable fold of P ced in these conditions, consistently with the lower χ exp values observed for fitting of the Rosetta models (not shown). In order to explicitly model the flexibility of P ced in solution, we employed the ensemble optimization method (EOM) [28]. To take into account the possibility of IDRs outside P core , we used atomistic structure-based models (SBM) [29] as a mean to rapidly sample the conformational space of residues 158-167 and 199-237 and impose extended or α-helical conformations. We then pooled all models from classical MDS and SBM MDS into an ensemble of ~ 12,300 models, which we fitted against SAXS data using EOM [28]to yield optimized ensembles ( Figure 5). A representative ensemble of ten conformers is shown in Figure 5 B. SAXS profiles could be adequately fitted using models from the pool ensemble, with χ exp values reaching a plateau at 0.8-0.9 for an ensemble size of 3 to 5 models (Figure 5 C and D), providing direct evidence for the presence of flexible regions in P ced [30]. Interestingly, the drop in χ exp value upon increasing ensemble size was less pronounced for the NDSB-containing sample, confirming the induced increase in protein fold stability, as has been observed previously [31]. The R g distributions of the optimized ensembles measured in low or high salt conditions, or in the presence of 1M NDSB-201 are shown in Figure 5 A, revealing a dynamic equilibrium between two populations that correspond to the presence or absence of α-helices outside the core region. Interestingly, the addition of NDSB-201 correlated with an increase in the percentage of α-helices in the optimized ensemble ( Figure 5 A).

Crystal structure of P core
A single diffracting crystal of HMPV P ced grew after 140 days, suggesting degradation occurred in the drop prior to   . Ab initio models of P ced before (A&B) and after (C&D) applying the SAXS filter. A. From left to right, the 5 bestscoring Rosetta fold-and-dock models generated from the sequence of HMPV P residues 155 to 241. Models are shown in cartoon and arecoloured from blue (N-terminus) to red (C-terminus). B. 5 best-scoring Rosetta fold-and-dock models from HMPV P residues 156 to 237. C and D. Same as in A and B after filtering out all models with χ exp > 1.3. All models were truncated to residues 156-237 to improve fitting accuracy (residues 156-157 were kept to account for the two extra N-terminal residues resulting from cleavage of the His6 tag by 3C protease).  crystallization. The crystal belonged to space group P2 1 2 1 2 1 (Table 3). Diffraction data were phased by molecular replacement using residues 169 to 194 from P core , revealing a tetrameric α-helical arrangement consistent with the modeled structure ( Figure 6 A and B). The asymmetric unit contains 2 tetramers, with a solvent content of 40%. Since the asymmetric unit could not physically contain 2 tetramers with residues 158-237 (with 1 tetramer present the solvent content is 16%), packing considerations suggest that degradation necessarily occurred prior to crystallization. Model 1 was selected for molecular replacement because it displayed the lowest RMSF in the core α-helical region during MDS ( Figure 4F). The stability of the P core region in ab initio models, MDS and SAXS prompted us to use residues 168 to 198 as a search model, however this resulted in rejection of a potential solution with a high translation function Z-score (TFZ) due to steric clashes between tetramers, which were eliminated by using a shorter fragment encompassing residues 169 to 194. The structure was subsequently refined to R work and R free values of 23.5% and 25.2%, respectively, confirming the identity of the modeled residues. The residues 158-168 and 195-237 are absent from the structure, in line with the flexibility observed for these regions by MDS and SAXS-based ensemble optimization. The tetrameric structure is stabilized by a large network of hydrophobic interactions involving Leu176,183,187,189,190,193, and Ile172, 179, 186 located on the inner surface of symmetry-related α-helices (Figure 6 C). Additional stability is provided by a solvent-exposed network of ionic interactions created by Glu173/Glu177 and Arg175, or Glu180 and Lys182 side chains from neighbouring protomers. Comparison with representative models from the SAXS optimized ensembles, as well as with the best scoring models from MDS (Figure 3 C & D) demonstrates Cα RMSDs ranging from 0.9 to 1.6 Å ( Figure  7) over aligned residues, confirming the accuracy of the modeled P core .

Discussion
The structure of P ced reveals the shortest tetrameric coiled-coil among the Paramyxoviridae The structural data presented here indicates that the αhelical tetramerization domain of HMPV P (residues 171-194) is considerably shorter than the highly conserved central region of the molecule (residues 158-237). Interestingly, the region 195-237 is shown by SAXS-based ensemble optimization to form an IDR with strong α-helical propensity. These transiently folded α-helices can be further stabilized by addition of NDSB-201. Taken together, these features suggest that residues 195 to 237 might constitute a molecular recognition element (MoRE) located directly downstream of the coiled coil region. A sequence alignment of HMPV P with HRSV and BRSV P is shown in Figure 8. The regions that have been mapped in RSV to be required for interaction with the N, L and M2-1 proteins are annotated based on published mutagenesis studies [7,9,11,14,32]. Interestingly, the sequence of P core aligns with a region of RSV P that is necessary for coimmunoprecipitation of the L protein [11], suggesting either direct binding to L or the requirement of a tetrameric P protein for efficient P-L association. Additionally, the α-helical MoRE located between HMPV P residues 195 to 237 shows strong conservation and overlaps with a putative nucleoprotein (N) binding region identified in RSV (residues161-180). Residues 221 to 241 of RSV P are also part of a putative N binding site, which aligns with residues 257 to 277 of HMPV featuring the predicted α4 helix (Figure 1 and 8). The flexibility of α3 (residues 195 to 237) relative to α2 (P core ) observed in the SAXS ensembles, together with the requirement of α3 and α4 for N binding, suggest that these regions may be part of a nucleoprotein-binding domain (NBD), similar to the C-terminal domain of phosphoproteins from other Mononegavirales [33][34][35][36][37]. Alternatively, the C-terminal region of HMPV P may act as a MoRE that folds upon binding to N, and assume an unstable tertiary structure in the absence of binding partner. This hypothesis is supported by the low propensity to form ordered structures observed for the C-terminal region of HMPV P ( Figure 1). P proteins from Mononegavirales are large modular proteins that are characterized by extensive IDRs of variable lengths containing multiple MoRE, a central oligomerization domain and, in some but not all viruses, a stable C-terminal domain [13,[38][39][40][41][42]. Because P protein sequences vary greatly in length (from 241 residues in RSV P to 709 residues in Nipah virus P), and have diverged beyond remote homology detection, it is difficult to compare P proteins from different families, or sometimes even different genera. However, the available structures of P oligomerization domains allow us to determine phylogenetic relationships from structural alignments. Figure 9 shows a phylogenetic tree of the crystal structures of tetrameric Paramyxoviridae P and dimeric Rhabdoviridae P oligomerization domains, built using the structure homology program SHP [43]. Interestingly, the tree obtained from P ced structures is similar to trees built based on large numbers of sequences from more conserved proteins such as N, M, F and L [44]. The tree highlights the structural divergence of P protein Tetramerization Domain of Human Metapneumovirus P PLOS ONE | www.plosone.org oligomerization domains across evolution, with more than three-fold variation in domain length across the Paramyxoviridae. The short tetrameric coiled coil from HMPV P clusters in a separate branch from SeV/MuV/MeV and RV/ VSV, emphasizing the pertinence of its classification as a separate subfamily, Pneumovirinae. Representative conformers populating the two peaks of the R g distribution are shown above the graph in cartoon representation and coloured from blue (N-terminus) to red (C-terminus). B. An ensemble of 10 conformers selected against the NDSB-containing sample, highlighting the dynamic equilibrium between random coil and α-helical conformations of the C-terminal region. C. EOMfitted SAXS profiles of P ced are represented by red lines. The colour-coding for the experimental curves is the same as in The combined use of ab initio modeling with small angle X-ray scattering and molecular dynamics appears to be a promising approach for solving the structure of small homo-oligomericprotein complexes The work presented in this study combines several computational techniques commonly used in protein structure prediction to yield a correct model for an unknown protein, applying low resolution information about shape and flexibility derived from SAXS as the sole experimental constraint. The advantages of the method can be summarized in four points (1). The recently developed Rosetta fold-and-dock protocol takes advantage of the reduced conformational space available to homo-oligomeric proteins to predict atomistic models [27].
(2) SAXS can be used to successfully filter out a large proportion of incorrect models, as has been shown inab initio protein structure prediction [44][45][46] and protein-protein docking [47,48]. (3) The usefulness of SAXS data to identify correct models can be increased by extracting information about protein flexibility and disorder through ensemble analysis [28,30], thus tackling the challenges associated with the modeling of partially unstructured proteins. (4) Classical MDS provide an additional mean of selecting and optimizing correct models and detecting flexible regions [49], while its sampling limitations can be overcome by using fast SBM MDS [29,50]. By combining methods (1) to (4), we obtained a detailed, crossvalidated picture of HMPV P ced structure and dynamics in solution, showing that it constitutes a promising approach for protein structure determination. The crystal structure of the core region of P ced comes as a validation of the accuracy of the model, and indirectly confirms the flexibility of the degraded flanking regions. The combined use of ab initio modeling, MDS and SAXS-based ensemble optimization constitutes a generally applicable method to predict protein structure, both in the presence of stable, potentially homo-oligomeric domains, transiently structured or completely disordered regions, and should become increasingly useful in the future.

Sequence-based analyses
Computational meta-disorder predictions and conservation scores based on sequence alignment of sequences from Pneumovirinae P were calculated following procedures described in [41]. Consensus secondary structure prediction was obtained from the Dismeta webserver [51].

Protein cloning, expression & purification
The region of the HMPV P gene from strain NL1-00 corresponding to residues 158-237 was amplified by PCR and cloned into pOPINF [52] for expression of P with an N-terminal His6 tag followed by a 3C cleavage site, using a proprietary ligation-independent In-Fusion system (Clontech), following standard procedures. The integrity of the cloned construct was checked by nucleotide sequencing.
The His6-3C-P158-237 construct was expressed in Rosetta2™ E. coli cells by overnight incubation under shaking at 17°C following 1 mM IPTG induction of 1 l terrific broth in presence of appropriate antibiotics. Cells were harvested by centrifugation (18°C, 20 min, 4000x g). The resulting cell pellets were resuspended in 20 mMTris, pH 7.5, 150 mM NaCl, 8 M urea. Cells were lyzed by sonication, and the lysate was centrifuged for 45 min at 4°C and 50000x g to remove cell debris. The supernatant was filtered (0.45 μm filter) and loaded on a column containing 2 ml of pre-equilibrated Ni-NTA Agarose (QIAGEN). After extensive washes, the protein was eluted in 20 mMTris, pH 7.5, 150 mM NaCl, 400mM imidazole. The protein was then subjected to size exclusion chromatography on a S200 column equilibrated in 20 mM Tris, pH 7.5, 1 M NaCl. The His6 tag was removed by addition of 3C protease at 4°C for 72h. The cleaved product was further purified through reverse Ni-NTA purification to remove Histagged 3C protease followed by an additional gel filtration step in 20 mM Tris, pH 7.5, 150 mM NaCl. The protein was concentrated using a Millipore concentration unit (c/o 10 kDa).

Small angle X-ray scattering experiments
Small angle x-ray scattering measurements of P ced (residues 158 to 237) were performed at the BM29 beamline in the European Synchrotron Radiation Facility (ESRF), Grenoble, France. Data was collected at 20°C, a wavelength of 0.0995 nm and a sample-to-detector distance of 1 m. 1D scattering profiles were generated and blank subtraction was performed by the data processing pipeline available at BM29 at the ESRF.

Computational modeling of P ced
The amino-acid sequences of P residues 155-241 or 156-237were used as input to the Rosetta fold-and-dock protocol with the default recommended parameters [27,53,54]. 2 x 30,000 models were generated and ranked using the Rosetta scoring function. In a second step, models were fitted to experimental SAXS data using CRYSOL [55]to yield the agreement between theoretical and experimental profile χ exp . χ exp values were then used to discard incorrect models based on an arbitrary threshold (χ exp >1.3).

Molecular dynamics simulations and ensemble optimization
All classical MDS were performed using the GROMACS 4 software package [56] and the AMBER99SB-ILDN* force field Structure of the asymmetric unit, showing the close packing of two tetrameric molecules. The left tetramer is shown in grey, and the right one is coloured from blue (N-terminus) to red (C-terminus). C. Structure of a single subunit from the crystal, highlighting the residues involved in intermolecular contacts. Hydrophobic residues are coloured in orange, while positively and negatively charged residues are coloured in blue and in red, respectively. doi: 10.1371/journal.pone.0080371.g006 [57,58]. At the beginning of each simulation, the protein was immersed in a box of SPC/E water. A minimum distance of 1.0 nm was applied between any protein atom and the edges of the box. Sodium ions were added to reach neutrality. Long range electrostatics were treated with the particle-mesh Ewald summation [59]. Bond lengths were constrained using the P-LINCS algorithm [60]. Hydrogens were treated as virtual sites [61], enabling an integration time step of 5fs. The v-rescale thermostat [62]and the Parrinello-Rahman barostat [63] were used to maintain a temperature of 300 K and a pressure of 1 atm. Each system was energy minimized using 1,000 steps of steepest descent and equilibrated for 200 ps with restrained protein heavy atoms before the beginning of the production simulation. For each system, two independent production simulations were obtained by using different initial velocities. The aggregated simulation time was ~4.15 µs (Table 2). Figure 7. Comparison of the crystal structure of P core with a minimal structural ensemble of P ced . Left panel, the crystal structure from P core (purple cartoon) is overlayed with an optimized ensemble of 5 models of P ced (coloured from blue (N-terminus) to red (C-terminus)) selected from SAXS data measured in 20 mMTris pH 7.5, 150 mMNaCl. Right panel, the crystal structure is additionally shown in purple cartoon. The observed range of Cα-RMSD values is indicated (calculated in Pymol). Calculation of root mean square deviations (RMSD) and root mean square fluctuations (RMSF) were carried out using GROMACS routines. Snapshots were extracted every 500 ps, resulting in a pool of ~8,300 models.
In order to obtain a more complete sampling of the IDRs, in particular those located between residues 195 and 237, model 1 was simulated using an atomistic coarse-grained structurebased model (SBM) [29,50]. Model 1 was selected because of the higher stability of the α-helical fold adopted by residues 195 to 220, as shown by its low RMSF (Figure 4), and also its higher frequency of selection in optimized ensembles (not shown). Two additional systems were simulated with either residues 220 to 237, or residues 195 to 237 in extended starting conformations, allowing fast sampling of the IDRs motions. 2000 snapshots were extracted from each simulation, yielding 4,000 additional models to the pool.
For each model from the pool ensemble (~12,300 models), theoretical SAXS patterns were calculated with the program CRYSOL [55] and ensemble fitting was performed with GAJOE [28]. The number of models in the selected ensemble was varied from 1 to 20 in order to determine the size of the minimal ensemble required to describe the data.

Crystallization and data collection
Crystallization was carried out via the vapor diffusion method using a Cartesian Technologies pipetting system [64]. The P158-237 construct crystallized after ~142 days in 25 % PEG 3350, 100 mM HEPES pH 7.5 at 20°C. Crystals were frozen in Figure 8. Annotated sequence alignment of the conserved C-terminal region of HMPV, HRSV and BRSV P. The location of the tetramerization domain (P core ) and the position of conserved residues are highlighted. Regions that have been associated with N, M2-1 or L interaction in RSV are indicated by a bar based on [7,9,11,14,32].   [43]. The obtained evolutionary distances were used to draw a tree in PHYLIP [77]. liquid nitrogen after being soaked in a mother liquor solution supplemented with 25% glycerol. Diffraction data was recorded on the I04 beamline at Diamond Light Source, Didcot, UK.

Structure determination and refinement
Anisotropic diffraction data to a resolution of 3.1Å were indexed and integrated using XDS [65] and scaled with SCALA [66] as implemented in the program xia2 [67]. The structure was determined by molecular replacement using P core residues from model 1 as a search model in PHASER [68]. The solution was subjected to repetitive rounds of restrained refinement in PHENIX [69] and Autobuster [70] and manual building in COOT [71]. Eight-fold non crystallographic local structure similarity restraints [72] were used throughout refinement and TLS parameters were included in the final round of refinement. The CCP4 program suite [73] was used for coordinate manipulations. The structures were validated with Molprobity [74]. Refinement statistics are given in Table 3, and final refined coordinates and structure factors have been deposited in the PDB with accession code 4BXT.

Structure analysis
All the structure-related figures were prepared with the PyMOL Molecular Graphics System (DeLano Scientific LLC). Protein interfaces were analyzed with the PISA webserver [75]. Structural alignments were calculated using PyMOL and SHP [43].