Structural Characterization of Minor Ampullate Spidroin Domains and Their Distinct Roles in Fibroin Solubility and Fiber Formation

Spider silk is protein fibers with extraordinary mechanical properties. Up to now, it is still poorly understood how silk proteins are kept in a soluble form before spinning into fibers and how the protein molecules are aligned orderly to form fibers. Minor ampullate spidroin is one of the seven types of silk proteins, which consists of four types of domains: N-terminal domain, C-terminal domain (CTD), repetitive domain (RP) and linker domain (LK). Here we report the tertiary structure of CTD and secondary structures of RP and LK in aqueous solution, and their roles in protein stability, solubility and fiber formation. The stability and solubility of individual domains are dramatically different and can be explained by their distinct structures. For the tri-domain miniature fibroin, RP-LK-CTDMi, the three domains have no or weak interactions with one another at low protein concentrations (<1 mg/ml). The CTD in RP-LK-CTDMi is very stable and soluble, but it cannot stabilize the entire protein against chemical and thermal denaturation while it can keep the entire tri-domain in a highly water-soluble state. In the presence of shear force, protein aggregation is greatly accelerated and the aggregation rate is determined by the stability of folded domains and solubility of the disordered domains. Only the tri-domain RP-LK-CTDMi could form silk-like fibers, indicating that all three domains play distinct roles in fiber formation: LK as a nucleation site for assembly of protein molecules, RP for assistance of the assembly and CTD for regulating alignment of the assembled molecules.


Introduction
Spider silk is an ideal super material due to its extraordinary mechanical properties compared to other available materials [1][2][3]. However, mass production of spider silk from nature is still impossible because the farming of spiders is limited by their cannibalistic and territorial behavior [4]. Thus, producing spider silk by recombinant biotechnology is one of the most promising alternatives [3,[5][6][7]. Before achieving this goal, one needs to understand the molecular structures, self-assembly mechanism and fiber formation of spider silk proteins, which are affected by solvent environment and shear and elongational forces [3]. Female orb-weaving spiders can produce up to seven types of silk with different mechanical properties by using different types of silk proteins [2]. Until now, complete gene sequences have been known for only dragline silk (or major ampullate silk) from black widow spider (Latrodectus hesperus) [8], and only partial gene sequences have been determined for other spider silks. According to the known spider silk protein sequences, silk proteins share several common features in primary structures: (i) large number of amino acids, (ii) many highly repetitive units flanked by nonrepetitive N-terminal domain (NTD) and C-terminal domain (CTD), and (iii) high abundance in Ala, Gly or/and Ser. Each repetitive unit of major ampullate spidroin (MaSp) contains multiple repeats of short simple motifs such as (A) n (n = 4-10) and GGX (X = A, Q or Y) [2,8]. Similarly, the repetitive unit of minor ampullate spidroin (MiSp) also has repeats of such short motifs [9]. Differently, each MiSp repetitive unit contains an additional relatively large domain that lacks repeats of short motifs [9,10]. Different from MaSp and MiSp, the repetitive units of aciniform and tubuliform silk proteins are complex and lack of the short motifs [11][12][13][14][15]. At present it is clear that the composition of the repetitive units varies from one type of silk protein to another, which is suggested to determine the mechanical properties of a given type of silk.
Unlike the repetitive units, CTDs are relatively conserved among all spider silk proteins except the web glue proteins and are assumed to perform the same function [16]. This also applies to NTDs [17]. Recent studies on two spider species have shown that the NTD of MaSp can regulate the self-assembly of silk proteins in a pH dependent manner [18] and the CTD of MaSp can prevent premature aggregation by stabilizing the solution state of silk proteins and direct the alignment of repetitive units to form well defined fibers [19][20][21]. In spite of the high sequence identity (58%) between the CTDs of MaSp (CTD Ma ) from A. diadematus and E. australis, these two domains have distinct biophysical properties.
For example, the intermolecular disulfide bridge in the CTD Ma from A. diadematus was considered to be important to the domain stability [21], but the disulfide bridge was found to have only slight contribution to the thermal stability of the domain from E. australis and to be not critical for fiber formation [19]. The CTD sequence identity among different types of silk proteins is lower than that among the same type of proteins from different species. For instance, the CTD of tubuliform spidroin (TuSp) and CTD Ma from the same N. antipodiana share 29% sequence identity. Due to the low sequence identity, the CTD Ma existed as a dimer [21] but the isolated CTD of TuSp (CTD Tu ) existed as oligomers in aqueous solution [22]. Therefore, the same type of CTDs from different species and different types of CTDs from the same species may display diverse biophysical properties. In order to demonstrate whether different CTDs perform the functions in the same or different molecular mechanisms, it is necessary to characterize the structures and biophysical properties of individual domains of different silk proteins and their functional roles in silk formation and protein storage.
We previously reported a MiSp clone, clone 145, from the total silk gland cDNA library of N. antipodiana [12]. The deduced amino acid (aa) sequence comprises one repetitive domain (RP Mi , 128aa, previously named as spacer), one non-repetitive C-terminal domain (CTD Mi , 107aa), and one linker domain (LK Mi , 89aa, previously named as repetitive sequence) that links RP Mi and CTD Mi or in general links two structured domains ( Figure 1). Until now, the N-terminal domain sequence has not been determined yet for any MiSps. RP Mi and CTD Mi are conserved among different spider species ( Figure S1), but LK Mi s vary significantly in number of amino acids among different repetitive units in the same MiSp [9,10]. Although the CTD Mi from N. antipodiana and CTD Ma from A. diadematus share 44% sequence identity, the CTD Mi contains no cysteine residues but the CTD Ma has one disulfide linkage between two molecules which can enhance the stability of CTD Ma [19,21]. Moreover, RP Mi is unique to MiSps and its functional roles in protein storage and fiber formation are unknown. Besides the difference in amino acid sequences, MaSp silk is elastic when stretched and MiSp displays irreversible deforming [9]. Thus, MiSp may adopt a different self-assembly and fiber formation mechanism than the well characterized MaSp. In this work, we report the three-dimensional (3D) structure of CTD Mi from N. antipodiana, the secondary structures of RP Mi and LK Mi and their roles in conferring protein stability, solubility and fiber formation.

Cloning of RP and LK Domains of MiSp from N. antipodiana
Forward (59-gcaaatgctatgaacagtttacttggt-39) and reverse (59attgcctaatgttgatacatatccacta-39) primers were designed on the basis of the known sequence of RP Mi [12]. MiSp fragments each containing one LK domain flanked by partial RP Mi sequences were obtained by polymerase chain reaction (PCR) from genomic DNA.

Protein Sample Preparation
The DNA sequence of our previously identified MiSp fragment (clone 145) [12] was confirmed here by PCR from our spider genomic DNA. The target genes encoding different MiSp regions (CTD Mi , RP Mi , RP-LK Mi , LK-CTD Mi , RP-LK-CTD Mi ) were amplified from clone 145 using specific primers and subcloned into a pET32-derived expression vector. The recombinant plasmids were transformed into E. coli BL21 strain (DE3). Cells were grown in LB or M9 medium at 37uC to an OD 600 of 0.6. Right after induction by 0.2 mM IPTG (isopropyl b-D-thiogalactoside), cells were shifted to 20uC and further cultured for 16 hrs. For 13 C, 15 Nlabeled ( 15 N-labeled) samples, the cells were cultured in M9 medium which contained only 15 N-labeled NH 4 Cl and 13 Clabeled (non-labeled) D-glucose as the sole nitrogen and carbon source. After over-expressed, the proteins were purified by immobilized metal affinity chromatography, gel filtration and then ion exchange columns. All the proteins used here contained a 6xHis-tag and a thrombin cleavage sequence at the N-terminus.

Circular Dichroism and Protein Unfolding
All circular dichroism (CD) spectra were recorded on a Jasco J-810 spectropolarimeter equipped with a thermal controller. A 0.1 cm path length cuvette was used for all CD experiments. The far-UV spectrum of RP Mi was recorded using a 20 mM protein in 10 mM sodium phosphate at pH 6.8. Both urea-and thermalinduced unfolding processes were monitored at 222 nm using samples with 10 mM protein, 10 mM sodium phosphate at pH 6.8. Except for RP-LK-CTD Mi , urea denaturation curves for other MiSp constructs were analyzed with the following equation derived from a two-state unfolding model [31].
where I obs is the experimental signal intensity in the presence of C molar urea, a and b the intercept and slope of the pre-transition zone respectively, C m is the urea concentration at the transition midpoint, and m is the slope at the transition midpoint. For ureadenaturation of RP-LK-CTD Mi , the experimental data were fitted using a linear combination of two two-state unfolding equations. Eq. 1 was also used to obtain T m by replacing C and C m by T and T m , respectively, where T is temperature, and T m is the temperature at the transition midpoint.

Size Exclusion Chromatography
A Superdex TM 75 PG (GE Healthcare) column with a total volume of 120 ml was used to run all the protein samples. The running buffer for RP Mi contained 10 mM sodium phosphate (pH 6.8) with or without 100 mM NaCl. For other samples, only 10 mM sodium phosphate at pH 6.8 was used. The flow rate used was 1 ml/min, and fractions were collected every 2 ml. The fractions were analyzed by SDS-PAGE to confirm which peak in the UV absorbing profile corresponded to the target protein. A molecular mass standard set consisting of Ribomuclease A (13.7 kDa), Chymotrypsinogen A (25 kDa), Ovalbumin (43 kDa), and BSA (67 kDa) was chromatographed to estimate the apparent molecular weights of target proteins.

Protein Solubility
The purified protein samples in respective 10 mM sodium phosphate and 10 mM Tris buffers (pH 7.0) were concentrated using centrifugal filter units with 3 kDa cutoff membrane at centrifugal force of 30006g. When the protein concentration was .5 mg/ml, 2 ml samples were regularly taken out from the solution until precipitate or gel was observed. Otherwise, larger volumes of samples were taken for concentration measurements.
To determine protein concentrations, the samples taken were diluted in the same buffers as those used for the protein samples. The concentrations were measured using the absorbance at 280 nm and also estimated using SDS PAGE.

Shear Force-Induced Aggregation
To study protein aggregation induced by shear force that plays a critical role in the natural silk spinning process, samples of 2 ml with 0.05 mg/ml proteins and 10 mM phosphate buffer (pH 6.8) were placed into a UV/Vis cuvette with a small magnetic star bar stirring at 500 rpm, 25uC. The turbidity of the samples was monitored by measuring OD 350 on a BIO-RAD Smart Spec TM Plus Spectrophotometer at a series of time intervals.
To determine the effect of sodium chloride and sodium phosphate on the aggregation of RP-LK-CTD Mi , shear forceinduced aggregation experiments were performed under two salt concentrations: 0 and 200 mM. The samples (1 mg/ml) placed in a 2 ml eppendorf tube were shaken at 150 rpm, 25uC in an incubation shaker. At different time points, the samples were taken out. After removing the precipitate by centrifuge, the concentration of the soluble portion was measured and then the total amount of precipitated protein was calculated.
Scanning Electron Microscopy 1 ml purified protein sample containing 5 mg/ml RP-LK-CTD Mi in 10 mM sodium phosphate buffer (pH 6.8) was placed into a 2 ml eppendorf tube and the sample was shaken at 200 rpm, 25uC for 5 minutes in an incubation shaker. Then, silklike fibers formed in the tube were picked out by a needle. SEM micrographs of the fibers were observed on a JEOL JSM-6510 and photographed at a voltage of 15 kV and room temperature (24-26uC).

Prediction of Disorder, Hydrophobicity and Aggregation Propensity
The disordered residues in LK Mi were predicted using PONDR-FIT (http://www.disprot.org/pondr-fit.php). If the disordered score of a residue is .0.5, this residue is considered as disordered [32]. The aggregation-prone regions in LK Mi were predicted using Zyggregator (http://www-vendruscolo.ch.cam.ac. uk/zyggregator.php). When a region of several consecutive residues each have aggregation scores larger than 1, this region is considered to be prone to aggregate [33]. The hydrophobicity plot of LK Mi was obtained using Protscale (http://web.expasy. org/cgi-bin/protscale/protscale.pl) with the scale option of Hphob./Roseman [34].

Sequences of RP and LK Domains
Our PCR results from the genomic DNA show that all the repetitive domains in the MiSp from N. antipodiana are identical. At present, the exact number of repeats has not been determined yet because the repetitive feature of the RP Mi in DNA. We identified 5 types of linker domains with different size ranging from 83 to 174 aa in genomic DNA ( Figure S2). Glycine (45-48%) and alanine (33-39%) are dominant in linker domains, which are consistent with previous reports [9,10]. RP Mi is highly conserved among different species ( Figure S1B). Interestingly, the linker domain between the CTD and RP domains of N. antipodiana (LK Mi ) obtained here is much shorter than that of N. clavipes [9].

Solution Structures of CTD Mi , RP Mi and LK Mi
In aqueous solution, CTD Mi formed a stable homodimer as evidenced by size exclusion chromatography (SEC, Figure S3). The structure of CTD Mi was determined using distance and dihedral restraints derived from multidimensional NMR spectroscopy ( Figure 2A and Table 1). Overall, the structure of CTD Mi adopts a globular fold of two twisted five-helix bundles (a1 [Gly 18  ] which pack in parallel to form a homodimer. a5 is swapped to stabilize the dimeric structure. The major dimer interface involves helices a1/a5', a4/a4' and a5/a1'. Many hydrophobic residues are located in the interface and are in close contact, suggesting that hydrophobic interactions are the dominant factor for holding the two monomers together. Similarly, hydrophobic interactions among different helices in each monomer (involving 26 hydrophobic residues) are critical for the stability of the monomer. In addition, a4 is connected with a1 and a2 through two salt bridges R27-E77 and R36-E85 in each monomer. The formation of the R36-E85 salt bridge is evident from the extremely large chemical shift of 1 H e of R36 (11.7 ppm) and the observation of 1 H g1 (10.5 ppm) and 2 H g2 (5.8 ppm) of R36. Although 1 H g1 and 2 H g2 of R27 were not detected, the sidechains of R27 and E77 are in close proximity to be able to form a salt-bridge. Mutation of R27 into A27 reduced the transition temperature of thermal denaturation (T m ) by ,20uC (Figure S4), confirming the presence of the R27-E77 salt bridge.
The overall structure of CTD Mi is very similar to the previously reported structure of CTD Ma [21] with a Dali Z-score of 15. In addition, both CTD Mi and CTD Ma contain two intra-molecular salt bridges and have many hydrophilic residues located on the surface. Nevertheless, there are several key differences in local structures. 1) For CTD Mi dimer, eight negatively charged carboxyl groups (four in each monomeric unit: E32, D61, D75, D103) are exposed on the protein surface ( Figure 2B), but no net charges on the surface of CTD Ma ( Figure S5). Note that each CTD Ma monomer contains only two negatively charged carboxyl groups and two positively guanidinium groups which form two salt bridges [21] and are buried. 2) CTD Mi contains no cysteine residues and there is no intermolecular disulfide bridge, but one intermolecular disulfide bond exists in the CTD Ma dimer [21]. 3) There are more hydrophobic residues located in between a5 and a3 and between a5 and a1' in CTD Mi than in CTD Ma ( Figure S6).
LK Mi (89 aa) contains 46.1% Gly and 32.3% Ala ( Figure S7A). It was predicted to be intrinsically disordered ( Figure 3A). Except the region of G54-Y70, most residues have hydrophobic scores larger than zero ( Figure S7B), implying that LK Mi has low water solubility. To determine experimentally the secondary structure of LK Mi , we tried to produce it in E. coli, but the production was not successful because it was degraded rapidly during the purification process. Thus we used the bi-domain fragment, LK-CTD Mi . A comparison of the 1 H-15 N HSQC spectra of the LK-CTD Mi and CTD Mi reveals that the backbone 1 H-15 N correlation peaks for the residues from the LK domain are located in the range of 7.7-8.5 ppm in the 1 H dimension and most Gly and Ala 1 H-15 N correlations are clustered together ( Figure 4A). This result shows that the LK domain is indeed intrinsically disordered. Except the correlation peaks from the N-terminal region of the isolated CTD Mi (e.g., V17, G18 and T20), other peaks from the isolated CTD Mi have the same 1 H and 15 N chemical shifts as those from the CTD in the bi-domain LK-CTD Mi . Note that V17 is the Nterminal end residue of the CTD domain and is the connection site of the LK and CTD in the LK-CTD Mi construct. The signal of G48 in the bi-domain was weak and had the same chemical shifts as the G48 in the isolated CTD although it is not visible in Figure 4A. The results indicate that there are no or only weak interactions between the two domains at a protein concentration equal to or less than 0.5 mM. To evaluate the aggregation propensity of the LK Mi , we predicted the aggregation-prone regions. The prediction shows that LK Mi contains three aggregation-prone regions with aggregation propensity scores .1: Y4-A12, A27-A37 and G67-A74 ( Figure 3B).
RP Mi was folded into mainly a-helical structure in water as shown by its circular dichroism (CD) spectrum ( Figure 5). It existed mainly in a monomeric form at low protein concentration (,1 mg/ml) in the presence of 100 mM NaCl, but mainly in a dimer together with small oligomers in the absence of salt as indicated by SEC ( Figure S3). Very likely, the monomer, dimer and larger oligomers are in dynamic equilibrium since a single SEC peak was observed. The 1D 1 H NMR spectrum of RP Mi shows that this domain adopts a folded 3D structure since its methyl proton signals display good dispersion with one methyl at 20.13 ppm ( Figure 4B). The line width of the peak at 20.13 ppm was 40 Hz, which was significantly larger than that for the methyl signal of CTD Mi at 0.03 ppm (26 Hz) ( Figure 4B). Because the NMR line width is proportional to the molecular size, the apparent size of RP Mi must be significantly larger than the size of CTD Mi dimer under the condition of 0.6 mM protein, 10 mM sodium phosphate and pH 6.8. As mentioned earlier, RP Mi (128 aa) consists of only ,20% more aa than CTD Mi (107 aa). Therefore, the NMR result further shows that RP Mi should exist in equilibrium between dimers and small oligomers at low salt concentration (10 mM sodium phosphate) and relatively high protein concentration (0.6 mM). Due to the oligomerization-prone feature, we have not solved RP Mi 's 3D structure yet.
At low protein concentrations (,1 mg/ml), RP-LK-CTD Mi existed as a dimer on the basis of the SEC data ( Figure S3). The dimer should be mediated through the CTD domain since the dimerization of CTD Mi is independent of protein concentration.

Stability of CTD Mi , RP Mi , LK-CTD Mi and RP-LK-CTD Mi
Full length silk proteins are extremely water soluble and stable when stored in the silk glands [35]. To understand how silk proteins are stored stably at high concentration, we investigated the stability and solubility of individual protein domains and their dependences on salt and protein concentrations that change significantly when the proteins pass through the spinning duct [3]. Although CTD Ma and CTD Mi have similar overall structures, their chemical and thermal stabilities are significantly different. The transition midpoints in urea (C m ) and temperature (T m ) denaturation of CTD Mi were ,4.8 M urea and ,71uC, respectively ( Figure 6A and Figure S4, blue line), which are significantly larger than those of CTD Ma (,2 M urea at 10 mM phosphate and ,2.8 M urea at 500 mM NaCl, 64uC at 10 mM phosphate) [21]. The result indicates CTD Mi is much more stable than CTD Ma . Interestingly, NaCl had nearly no effect on the chemical stability of CTD Mi (Figure 6A), while NaCl could stabilize CTD Ma [21]. The stability of CTD Mi was independent of protein concentration when the concentration was below 0.2 mM, but CTD Ma was much more stable against urea denaturation at a protein concentration of 5 mM than 0.2 mM [21].
To examine the importance of the solvent-exposed charges to the stability of CTD Mi , we prepared four conserved single-point mutants (E32Q, D61N, D75N and D103N) and one double-point mutant (E32Q/D75N). E32Q, D75N and D103N mutants showed significantly lower C m values than the wild type CTD Mi although the mutation of D61N had only a slight effect on the  (Figure 7, filled square and unfilled triangle). The results indicate that the solvent-exposed negatively charged residues are critical to the stability of CTD Mi . Interestingly, these negatively charged residues are conserved or partially conserved in all MiSp, but absent in the CTD Ma of A. diadematus ( Figure S1A). Besides the solvent-exposed negative charges, other factors such as hydrophobic interaction and hydrogen bonding which are slightly different in the two CTDs may also contribute to their significant difference in stability. The chemical stability of LK-CTD Mi and CTD Mi was nearly identical ( Figure 6A and 6B). This result shows that LK Mi has no obvious effects on CTD Mi 's stability, implying that LK Mi does not interact with CTD Mi and confirming the conclusion drawn from the comparison of 2D HSQC spectra. The T m of LK-CTD Mi was about 4uC lower than that of CTD Mi . This should not have resulted from the interaction of LK and CTD, but could be caused by the gradual slight aggregation of LK-CTD Mi during the temperature ramping process. It is noteworthy that a small of amount of precipitate was observed only for LK-CTD Mi and RP-LK-CTD Mi in the thermal denaturation process. Similar to CTD Mi , LK-CTD Mi was not influenced in stability by salt when NaCl concentration was below 500 mM.
RP Mi also displayed a typical two-state unfolding profile, but showed a much lower stability than CTD Mi ( Figure 6C). Its C m went up from 1.4 M to 2.3 M urea when NaCl concentration was increased from 0 to 500 mM. This salt-dependent stability is similar to CTD Ma but different from CTD Mi . Thermal denaturation ( Figure S8, black and blue lines) also shows RP Mi is much less stable than CTD Mi (T m : 53 vs 71uC).
Unlike individual domains, the tri-domain fragment, RP-LK-CTD Mi , unfolded in two steps with the increase of urea concentrations. The denaturation curves were fitted ( Figure 6D), and the extracted first and second C m values were the same as the C m values of the isolated RP Mi and CTD Mi , respectively. This suggests RP Mi tends to unfold first, followed by the unfolding of CTD Mi . The result also indicates that the three domains of RP-LK-CTD Mi have no or very weak interactions in solution at low protein concentrations (,10 mM). This conclusion is consistent with that drawn from the NMR data analysis, and is further supported by the fact that the first-step unfolding of RP-LK-CTD Mi is obviously dependent on salt (C m : 1.4-2.3 M) and the second-step unfolding is nearly independent of salt (C m : 4.7-4.9 M) ( Figure 6D). The thermal denaturation data ( Figure S8, cyan line) also indicates RP-LK-CTD Mi unfolds in two steps in a noncooperative way and the CTD in the tri-domain protein cannot stabilize the RP. Taken together, in spite of the high stability of CTD Mi , the tri-domain protein can easily undergo conformational changes due to the low stability of RP Mi when the protein is in a dimeric structure. To achieve high stability, the tri-domain protein and full length MiSp may assemble to form high order structures like oligomers. Due to the absence of the RP domain in MaSp, MaSp and MiSp may use different mechanisms to achieve high stability.

Solubility of CTD Mi , RP Mi , LK-CTD Mi and RP-LK-CTD Mi
In 10 mM Tris buffer (pH 7.0), CTD Mi , LK-CTD Mi , RP-LK-CTD Mi , RP Mi and RP-LK Mi could be concentrated to about 300, 200, 150, 60 and 5 mg/ml before the observation of precipitate or gel. In 10 mM sodium phosphate (pH 7.0), the solubility of each protein was nearly the same as that in 10 mM Tris, indicating that the solubility is not affected by buffer. RP-LK Mi had the lowest solubility and was prone to precipitate. Other domains or fragments did not precipitate during the concentration process, but they formed gel when their concentrations were above their corresponding maxima. As shown in Figure 2B, CTD Mi is purely negatively charged and very polar on its surface. The electrostatic repulsion among negatively charged dimeric CTD Mi can prevent self-assembly for the formation of random aggregates. Therefore, the high hydrophilicity and unique charge of CTD Mi surface explains its extremely high solubility. On the basis of the high solubility and dimerization feature, CTD Mi has been used to generate large sized silk-like proteins for strong silk fiber production [36].
Although RP Mi was easy to form small oligomers, its water solubility was still quite high. This may be achieved by burying some solvent-exposed hydrophobic patches through formation of oligomeric structures. LK-RP Mi was much less soluble than RP Mi , demonstrating that the solubility of LK Mi should be significantly lower than 5 mg/ml. LK Mi 's low solubility agrees with its high hydrophobicity ( Figure S7B). Interestingly, LK-CTD Mi and RP-LK-CTD Mi were very soluble. This may be explained by the presence of the highly soluble CTD Mi through mutual compensation in solubility. Most likely, however, the high solubility of the LK-CTD Mi and RP-LK-CTD Mi is achieved through an alternative mechanism by forming oligomers. With this mechanism, the poorly soluble domains or fragments assemble to form oligomers through the aggregation-prone regions in LK Mi or/and RP Mi , leading to partial burial of solvent-exposed hydrophobic regions and then resulting in high solubility of the entire protein. The presence of such oligomers in the sample of 3 mM RP-LK-CTD Mi is evidenced by the observation of the significant increase of the line width of methyl proton NMR signals from the RP domain rather than from the CTD domain ( Figure 4B). Similar to RP-LK-CTD Mi , the full length MiSp (which comprises about 15 repeats of RP-LK Mi ) may also exist in oligomers in the silk gland where the protein concentrations can reach up to ,50% w/w [35].
Our results also suggest that CTD Mi and RP-LK Mi play distinct roles in maintaining MiSp proteins in a highly water soluble form, i.e., RP-LK Mi initiates the oligomerization through weak hydrophobic interactions among LK Mi and RP Mi domains and forms the core region of the oligomers, while CTD Mi prevents MiSp from forming precipitate by staying outside the oligomer core. The structure of the oligomers formed by MiSp fragments seems quite different from that by TuSp fragments which resembles a micelle-like structure [22]. The different structures may result from the significant differences in the LK and CTD domains: LK Mi (89aa) is much larger and more hydrophobic than the linker region between the RP and CTD of TuSp1 (48aa); the isolated CTD of TuSp exists as oligomers but CTD Mi as dimers in aqueous solution [22]. MaSp was also proposed to form a micelle-like structure in which the repetitive domains are inside the micelle and CTD domains are outside [21]. Because of the significant difference in amino acid sequences, different types of silk proteins may use different ways to form high order structures for stable storage. In all the cases, however, CTDs are located outside the assembled structures to enhance the solubility of the assembled form. To

Stability against Shear Force
In the natural silk spinning process, silk proteins pass through the spinneret in the silk gland and then become silk fibers. During this spinning, the proteins undergo conformational changes after encountering shear and elongational forces [3].
Here, we studied the effect of shear force on protein stability and aggregation by stirring protein solutions. In the absence of stirring (mechanical shear force), RP Mi , RP-LK Mi , RP-LK-CTD Mi , LK-CTD Mi , CTD Mi and maltose binding protein (MBP) could maintain a soluble state under the condition of 0.05 mg/ml protein, 10 mM phosphate, pH 6.8 and 25uC without detectable precipitate within two days. In the presence of stirring, however, all of them tended to aggregate to form visible precipitate that is detectable at 350 nm. The changes in the amount of aggregated proteins with time are shown in Figure 8A. The C m and T m values of MBP in the absence of its ligand are 3.3 M urea and 63uC respectively [37], indicating MBP is more stable than RP Mi but less stable than CTD Mi . Figure 8A shows that the aggregation rates of RP Mi , CTD Mi and MBP are inversely proportional to their thermal or chemical stability. The aggregation should occur through partial protein unfolding and then assembly of the partially unfolded molecules. Note that the partially unfolded proteins have more solvent-exposed hydrophobic residues than the folded ones. Therefore the more stable a protein is, the slower the protein unfolding is, and the slower the shear-force-induced aggregation is.
The aggregation rates of CTD Mi and RP Mi were substantially accelerated by covalently linking LK Mi domain to them respectively ( Figure 8A). This result can be explained by the high aggregation propensity and low water solubility of the LK Mi domain ( Figure 3B and Figure S7B). Although LK Mi is intrinsically disordered, stirring still greatly enhanced the aggregation rate of LK-CTD Mi and RP-LK Mi , implying that the aggregation-prone regions are partially protected in the bi-domain protein fragments and shear force can reduce the protection. The protection may be achieved by partial local folding of the aggregation regions of  Both NaCl and Na 3 PO 4 were able to enhance the aggregation of RP-LK-CTD Mi in the presence of shear force in similar rates ( Figure 8B). In the case of MaSp, the effect of NaCl on the aggregation of CTD and RP-CTD was much less pronounced than that of Na 3 PO 4 [21]. This result suggests that the fibroin storage and/or assembly conditions in MiSp and MaSp spider glands may be different.

Fiber Formation
All the single and bi-domain constructs underwent nonspecific aggregation or precipitation in aqueous solution upon gentle shaking. Under the same condition, however, RP-LK-CTD Mi could form small fibers with well-aligned structure and smooth surface even at a low protein concentration of ,0.3 mg/ml ( Figure 8C). The diameters of the formed fibers ranged from ,2-10 mm, similar to that of the native MiSp silk [38]. The result reveals that all the three domains should participate in the finetuned process of fiber formation. Previous studies on MaSp and TuSp have shown that the minimum sequence requirement for a silk protein fragment to form silk fibers is that the fragment should contain a RP region and a terminal domain [20][21][22]. A recent study has revealed that the RP domain of aciniform spidroin alone could form silk fibers [39]. Therefore, MiSp fragments and full length MiSp may adopt a different fiber formation mechanism from other spider silk proteins.
Based on its low solubility (,,5 mg/ml), high hydrophobicity and aggregation propensity ( Figure 3B, Figure S7B) and high aggregation rate in the presence of shear force ( Figure 8A), LK Mi may act as a nucleation site to initiate the assembly of RP-LK-CTD Mi molecules through hydrophobic interactions among LK domains. Since RP Mi is prone to form oligomers and is unstable against shear force and chemical and thermal denaturation, it may assist the LK domain to assemble silk protein molecules together and play a dominant role in conformational changes upon shear force. In the absence of CTD Mi , MiSp fragments such as RP-LK Mi and RP Mi formed only precipitate, indicating CTD Mi is essential to silk fiber formation. The folded CTD Mi may regulate the alignment of the assembled molecules by controlling the assembling rate since it can slow down the aggregation rate of RP-LK Mi ( Figure 8A), which leads to controlled formation of well-defined fibers rather than non-specific aggregation.

Conclusions
CTD Mi , RP Mi and LK Mi have very distinct stability and solubility, which can be explained by their different structures, and each play specific roles in conferring the stable storage of MiSp fragments in vitro or full length MiSp in the silk gland. Due to the oligomerization-prone feature of RP Mi and LK Mi , they are able to initiate the oligomerization through weak hydrophobic interactions among LK Mi and RP Mi domains and form the core region of the oligomers. On the other hand, because of the high solubility, CTD Mi may prevent the MiSp fragments or full length MiSp from forming precipitate by staying outside the oligomer core.
Shear force greatly accelerates protein aggregation through protein partial unfolding. In the presence of shear force, the aggregation rate of a folded protein is inversely proportional to its thermal or chemical stability; while the aggregation rate of a multidomain protein containing both folded and disordered domains is determined mainly by the property of the disordered domain and the solubility of the entire protein. Although all MiSp domains investigated here could self-assemble in the presence of shear force, only the tri-domain RP-LK-CTD Mi formed well defined silk fibers, indicating that all three domains play distinct roles in fiber formation. According to our experimental data, we propose that the LK domain serves as a nucleation site to assemble different molecules together and CTD domains enable the arrangement of the assembled molecules in a highly ordered manner in the presence of shear force. Although MiSp, MaSp and TuSp fragments assemble in different ways, the relatively conserved CTD domains seem to play the same function, i.e., maintaining the assembled form in a highly soluble state before fiber formation and regulating the alignment of assembled molecules to form silk fibers. Due to the significant differences in biophysical properties among different types of CTDs and in primary structures and properties among different types of RPs, the molecular mechanisms of self-assembly and fiber formation for different types of silk proteins can be different. To reveal the detailed mechanisms, further studies on the structures of the assembled forms are required.  , negatively charged and polar residues (including all backbone and side-chain atoms) are colored by blue, red and light blue. Note that the exposed red and blue regions in the left panel are not from the charged carboxyl groups and guanidinium groups but from other parts of the charged residues. (PDF) Figure S6 Comparison of hydrophobic interactions between a5 and a3 and between a5 and a1' for CTD Ma (a) and CTD Mi (b). Yellow and green represent hydrophobic and non-hydrophobic residues, respectively. Here Thr is considered as hydrophobic. (PDF) Figure S7 Amino acid sequence of LK Mi (a) and its hydrophobicity plot (b). (PDF) Figure S8 Temperature-induced unfolding of different MiSp fragments. Except for RP-LK-CTD, the other curves were fitted using a two-state equation (Eq. 1). The curve for RP-LK-CTD was fitted using a linear combination of two two-state equations. All samples contained 10 mM protein and 10 mM phosphate buffer at pH 6.8. (PDF)