Viscoelastic properties of wheat gluten in a molecular dynamics study

Wheat (Triticum spp.) gluten consists mainly of intrinsincally disordered storage proteins (glutenins and gliadins) that can form megadalton-sized networks. These networks are responsible for the unique viscoelastic properties of wheat dough and affect the quality of bread. These properties have not yet been studied by molecular level simulations. Here, we use a newly developed α-C-based coarse-grained model to study ∼ 4000-residue systems. The corresponding time-dependent properties are studied through shear and axial deformations. We measure the response force to the deformation, the number of entanglements and cavities, the mobility of residues, the number of the inter-chain bonds, etc. Glutenins are shown to influence the mechanics of gluten much more than gliadins. Our simulations are consistent with the existing ideas about gluten elasticity and emphasize the role of entanglements and hydrogen bonding. We also demonstrate that the storage proteins in maize and rice lead to weaker elasticity which points to the unique properties of wheat gluten.


Introduction
When a dough from the wheat flour is gently washed in water, the cohesive mass that remains is gluten. Seed storage proteins constitute over 80% of this mass [1]. They  1. The formation of hydrogen and disulfide bonds: in every gluten protein over 30% of amino acids is glutamine, whose side chain can be both a donor and an acceptor of hydrogen bonds [2]. Disulfide bonds can hold gluten proteins together and generate a sort of a polymer gel [8]; 2. The loops-and-trains model [4,14] explains gluten elasticity in terms of "loops" that are created by neighboring polymers: they stay together because of hydrogen bonding, but when some bonds are broken (e.g. by water), a free space emerges between the polymers. Under stretching, cavities disappear and hydrogen bonds are made between elongated and nearly parallel chain fragments, forming "trains" that are formed along the direction of stretching and cause resistance to further stretching; 3. The entangled nature of the polymer network itself leads to a strain resistance, regardless of its chemical nature [7].
We can study these phenomena in our molecular dynamics model by measuring the number of inter-residue contacts described by the Lennard-Jones (L-J) potential, the number of disulfide bonds (that can form and rupture dynamically), the number and size of cavities (as determined by using the Spaceball algorithm [15,16]) and the number of entanglements (using the Z1 algorithm [17][18][19][20]). Structural properties, like the typical distance between the ends of a protein chain, can also be evaluated. We have also computed the mechanical work required to stretch gluten.
We consider gluten and the systems of its different fractions separately in order to elucidate the differences between these proteins. In addition, we also consider a control sample of proteins that do not exhibit such extraordinary viscoelastic properties [21]. Specifically, we use proteins derived from maize and rice for this task. Thus, altogether, we study 5 different systems. We show that many of the properties of gluten are substantially affected by "kneading" which we immitate by introducing periodic deformations.

Methods
Our molecular dynamics model is described in details in ref. [9]. The description, however, has not dealt with the formation of the disulfide bonds that are important for gluten.
Briefly, we use periodic boundary conditions in two directions (X and Y) and solid walls in the third direction, denoted as Z. The starting conformations of the proteins are constructed by generating self-avoiding random walks. They are then evolved by the methods of the coarse-grained molecular dynamics that were shown to correctly recreate properties of a set of intrinsically disordered proteins [9]. Rheological information is then obtained by deforming the box containing gluten proteins and recording the resulting force of response (see Subsection, "Simulation protocol").
We simulate gluten as a set of protein chains in an implicit solvent. Each amino acid is represented by one pseudoatom and subsequent residues in a chain are connected by a harmonic potential. Chain stiffness is introduced through bond angle and dihedral angle potentials as obtained from a random coil library [9,22]. These angle potentials have different forms based on the identity of residues constituting a given bond or dihedral angle: proline and glycine are treated as special cases. These particular amino acids are abundant in gluten [3].
Electrostatics are governed by the Debye-Hückel potential with relative permittivity that is dependent on the distance [23]: κ = 40 nm −1 r. Other nonlocal interactions are modeled by the 6-12 L-J potential, which can be attractive (in its full form) or repulsive (truncated at the minimum and shifted). Contacts between residues form and disappear dynamically based on different criteria. The criteria have been developed by studying atomic-level overlaps that occur in contacts arising in structured proteins. We distinguish between contacts arising from sidechain-sidechain (ss), sidechain-backbone (bs) and backbone-backbone (bb) interactions. However, the resulting depth of the potentials well, �, is common, no matter what is its origin. Contacts involving sidechains are amino-acid specific. The specificity is introduced primarily through the locations of the potential minima. Some gluten proteins are thought to be partially structured, so a part of the contacts are always attractive.

The DSB coarse-grained model
The Dynamic Structure-Based model has been introduced in ref. [9] and here we just give a brief summary. We label the interacting pseudoatoms by indices i and j. Their positions arer i andr j , and r i;j ¼ jr i Àr j j is the distance between them. The mass of each residue is set to the average amino acid mass m. The positions evolve according to the Langevin equations of motion with the time unit, τ, of � 1 ns, the damping coefficient γ = 2m/τ (corresponding to overdamped dynamics [24]), and thermal white noiseG i with variance σ 2 = 2γk B T: We solve the equations of motion by using the 5th order predictor-corrector algorithm [25]. Energy unit � is of order 1.5 kcal/mol [26,27]. The room temperature is then about 0.3-0.35 �/k B , where k B is the Boltzmann constant. Here, the simulations are run at 0.3 �/k B . The results are presented, approximately, in the metric units.
The excluded volume of the residues (whether involved in a contact or not) is ensured by the L-J potential that is cut at r o = 0.5 nm (so that V r (r i,j � r o ) = 0): Where s 0 ¼ r o � ð0:5Þ 1 6 , so that V r (r o ) = 0. i, j = i+2 interactions are always described by this repulsive potential. Connected i, j = i+1 beads interact via harmonic potential with the elastic constant of k = 5000 nm −2 �� and minimum at r b = 0.38 nm (consistent with other works [27,28]).
The distinction between the bb, bs and ss contacts in our one-bead-per-residue model is accomplished in the following way. When two beads come sufficiently close to each other, we compute two vectors based on positions of three consecutive beads (i-1,i,i+1): Àñ i is the negative normal vector, approximately pointing in the sidechain direction, andh i is the binormal vector that points in the direction of a possible backbone hydrogen bond. The relative directions of these vectors are checked, and if, for instance,h i andh j point towards each other, a bb contact is made. Each type of amino acid has a specific number of contacts it can make with its virtual "backbone" and "sidechain". Each ss contact making a pair of amino acids corresponds to different distances at the minimum of the L-J potential. On the other hand, that minimum is 0.5 nm for the bb contacts and 0.68 nm for the bs contacts. Independent of the source of the contact, the interaction is modeled by the potential of the same form: where σ attr depends on the type of interaction and specificity. However, if the contact is due to the bb interactions, its strength is doubled. Doubling the depth of the L-J potential for the bb contacts is consistent with the earlier literature findings [29]. If criteria for creation of a given type of contact are met, the attractive part of the potential is turned on at r min in a quasi-adiabatic fashion (during 10 τ). When the beads are in a distance larger than r f = f σ attr from each other, the potential is turned off in the same way. This is why we call those contacts "dynamic" as they can appear and disappear during the simulation. In references [9,30] and in this paper, we have used f = 1.5 (in an analogy to our description of relevant contacts in the structured proteins [26]). This model was previously used to study polyglutamine aggregation [30], so it should be well-suited for studying glutamine-rich gluten proteins. However, our more recent tests indicate that the choice f = 1.3 is better. In particular, it is more consistent with the ANTON-based simulations of α-synuclein [31]. Disulfide bonds. For modeling disulfide bonds, we considered using harmonic potential that is switched on and off, following Thirumalai et al [32]. However, we have found out that it is equivalent but much simpler to use the L-J potential with the strength of 4 � and r SS min ¼ 0:59 nm.
The disulfide bond potential is turned on and off in the same manner as other ss contacts, i.e. adiabatically within approximately 10 τ. It should be noted that the real timescales of the disulfide bond formation and their duration are much longer [33]. The only modification besides the amplitude change is that after two cysteins form such a contact, they cannot form additional disulfide bridges (without a prior rapture of the existing bridge).
Structured parts. Some of the gluten proteins contain domains near their termini that tend to be more structured than the whole chain. For example, homology of the A domain of high molecular weight glutenins to a plant inhibitor of digestive enzymes [34] suggests that one or two pairs of cysteines in that domain form fixed intra-chain disulfide bridges. We incorporated available information about possible structured parts of gluten by using a structure-based Go model for those regions [26]. The identification of segments that are recognized as structured domains was based on the UniProt [35] partition of gluten proteins into domains (see Section 1 of the S1 Text for details). For the backbone stiffness of the structured segments we used the same potential as for the unstructured parts.
The actual structures in these segments were predicted by using the ITASSER server [36,37]. The contact maps were generated based on those all-atom structures, using the overlap criterion (spheres corresponding to heavy atoms of the residues in contact must overlap [26]). The contacts present in the contact map in the structured regions are always attractive (they are "static", in contrast with the "dynamic" contacts operating in the disordered parts) and the L-J potential describing them has a minimum that corresponds to the distance between the C α atoms in the reference all-atom structure.
Residues listed in the "static" contact map can still make "disordered" contacts with the residues outside the contact map. The number of "static" contacts made by a residue is determined by the same r < r f criterion, but the potential does not switch off if r > r f .

Simulation protocol
We performed simulations of 5 different systems specified in Table 1. The exact protein composition of all simulated systems is presented and described in detail in Section 1 of the S1 Text.
While cereal storage proteins are still in the grain, they are usually located in the protein bodies or protein storage vacuoles [38]. After grain maturation, these deposits can fuse and form a continous network [1]. This is reflected in our simulation protocol: in the beginning, the proteins are in a low-density box and have time to attain their shapes separately [39]. Then, as the box dimensions get smaller, different chains connect together, forming one big cluster. Unfortunately we cannot reproduce drying and hydratation in an implicit-solvent model, but the final density of the system reflects the density of real gluten samples [40]. Mixing and kneading of the flour and the resulting dough is represented by the periodic deformations of the box, that can also be compared to the experimental deformations of (much larger) gluten samples in a rheometer [41].
The first step of the simulations is generating the chains as self-avoiding-random-walks positioned randomly in a box with density ρ = 0.1 res/nm 3 (residues per cubic nanometer). If an initial random walk configuration happens to cross the boundary of the box, the box is enlarged to keep all chains inside, resulting in a slighly lower initial density. The density of 0.1 res/nm 3 is more than 30 times lower than the density of gluten, which should ensure that proteins are in monomeric or dimeric form during the initial equilibration (see inset 1 in Fig 1). Boundary conditions along the X and Y axes are periodic, whereas the walls along the Z direction are repulsive. They are separated by the distance of s. The residues are repelled from the wall by the repulsive part of the L-J potential with the depth 4�. After equilibrating the system for 125 μs the box dimensions are reduced from the initial size with the speed of 2 mm/s, after the density ρ 0 (usually corresponding to the real density of gluten [40] ρ 0 � 3.5 res/nm 3 ) is reached (see Section 3 of the S1 Text for more information about the density). At this stage, the distance between the repulsive walls has a value that is denoted by s 0 . The system is then equilibrated for another 150 μs.
After the second equilibration stage (depicted in inset 2 in Fig 1) the walls along the Z direction start to attract the proteins. When the distance between any residue and the wall gets closer than d min = 0.5 nm, the attractive part of the L-J potential with depth of 4� is turned on The blue curve shows the force exerted on the system by those two walls (summed over all the residues and averaged over 100 ns time interval). The data is for gluten (defined in Section 1 of the S1 Text), with the density of 3.5 nm −3 . The period of oscillations is 40 ms. The six insets correspond to the snapshots of the system taken at different stages of the simulation (each bead represents one amino acid, the protein chains are shown in different colors). The same coordinate system (top left) is used for all 6 insets. adiabatically (in the same manner as for disulfide bonds). Then the interaction between the wall and a residue attached to it is described by V wall = 4V LJ (r i,w ), where r i,w is the distance between the ith pseudoatom and an ith interaction center on the wall. The X and Y coordinates of the ith interaction center are fixed at the values of the first bead attachment (when it came to the wall closer than d min = 0.5 nm, which is the distance corresponding to the minimum of the V wall potential). The Z coordinate of an interaction center is 0 on one rigid wall and s on the opposite wall. If a wall moves so that s 6 ¼ s 0 , the interaction centers move together with the wall. If the bead departs away from the interaction center by more than 2 nm, it is considered detached and the interaction center disappears. The bead may reattach at another place later. For d < d min the repulsive part of the L-J potential is always turned on to ensure that no residue can pass through the wall. The force acting on the walls is a sum of forces that individual residues exert on both walls. It is averaged over 100 ns interval to filter the thermal noise out. An alternative way to introduce interactions with the walls (involving the fcc lattice of psudoatoms) is discussed in Section 4 of the S1 Text.
After the wall attraction is enabled, we give another 150 μs for the equilibration of the system (inset 3 in Fig 1). After this third equilibration stage, when the attraction is already turned on, the walls in the Z direction are moved to the distance of s = s 0 − A, where A is the amplitude of the oscillations in the normal direction (we set A = 1 nm). After being moved to the minimum (shown in inset 4 in Fig 1), the wall position changes periodically: The wall at Z = 0 stays fixed. In the experiments, ω � 1 Hz [2], which is not possible to achieve in our simulations. Our default period is 40 μs, corresponding to f � 25 kHz. The maximum displacement is shown in inset 5 in Fig 1. For shearing simulations, the walls are also displaced with the same amplitude, A, but the oscillations take place in the X direction and are described by s 0 (t) = −A cos(ωt) (where s 0 = 0 corresponds to one rigid wall being directly above the other). See S1 Fig for an illustration.
After oscillations in the normal direction (or after additional 100 μs of equilibration if no oscillations were made) the system is equilibrated again for 75 μs and then the walls in the Z direction extend gradually to 2s 0 so that the total work required for such an extension can be measured, as well as the distance and force required to break the system apart (recreating conditions for measuring uniaxial extension [41]). Inset 6 in Fig 1 depicts this elongation. The speed of elongation is 0.5 mm/s, which is a typical pulling speed employed in AFM stretching experiments in coarse-grained simulations [26,42]. The maximum speed during oscillations (v max = Aω) is even lower.

Properties of gluten, maize and rice
Based on refs. [3,8,11,[43][44][45][46][47][48][49] we chose proteins that were representative for each of the five systems we study: gliadins, glutenins, gluten and storage proteins from maize and rice. We then used the UniProt database [35] to select the corresponding sequences. The resulting compositions are described in Section 1 of the S1 Text. Table 1 summarizes the key properties of each of the systems studied. A large number of cysteines per chain should increase the propensity to form inter-chain disulfide bonds. On the other hand, longer chains should make entanglements easier. Finally, the "stickiness" of the system is reflected in the coordination number z, which is defined as the mean number of contacts made by each residue (both "dynamic" and "static" combined).
We performed three types of simulations: a) without any oscillations; b) with oscillations with deformation in the normal (Z) direction; c) with oscillations with deformation in the shear (X) direction. All of the properties are calculated in the stage after oscillations (in the absence of oscillations-after the equivalent time is passed). The parameter z does not seem to be affected by the oscillations.
However, the values of 5 other quantities depend on the prior history of the system. Those quantities are: the number of entanglements l k , the maximum force F max , the mechanical work W max required to stretch the system, the number of the interchain contacts n inter and the RMSF (root mean square fluctuation) as averaged over all residues and determined as a deviation from a mean location of the residue.
It should be noted that each of the systems studied is simulated in a somewhat different box (to achieve the same density for different numbers of residues) and thus comparison of the absolute values of these quantities is not very meaningful. Instead, we analyze the ratios of the five quantities to their average values in the absence of oscillations (denoted by the tildas). The first of the quantities studied is the total number of entanglements. We have determined l k by using the Z1 algorithm [17]. An entanglement occurs if a path connecting the first and last residue in a chain cannot be shortened to a straight line without crossing a path from another chain [18,19]. After minimizing the lengths of all such paths with the constraint that they cannot cross one another, each kink, where two paths intertwine, counts as an entanglement [20]. An example of an entanglement is shown in inset 6 in Fig 1, where the end of the grey chain (a low molecular weight glutenin) crosses a loop made by a blue chain (a high molecular weight glutenin). Due to the low number of the chains, each snapshot of the simulation usually has only a few entanglements. We averaged l k over all snapshots made after oscillations.
We use a deterministic random number generator, which produces the same output when provided with same initial random number, called, "random seed" [50]). The simulations for each system were repeated at least three times with a different random seed. The same set of random seeds was used for the shearing, pulling and equilibrium simulations. The result shown in Fig 2 uses the mean values from all simulations, but the error of the mean is quite high. Nevertheless, a clear increase is observed for gluten: the oscillating deformations cause more entanglements. We did not include self-entanglements (knots) in Fig 2, but they are also present in gluten: some long proteins can form very deep knots (see S5 Fig), that can last for over 100 μs.
The differences between the studied systems are most pronounced in the case of F max , the maximum force required to stretch the system in the last stage of the simulation (illustrated in inset 6 in Fig 1). The blue curve in Fig 1 shows how the force rises and then drops near the end due to the rapture. However, different random seeds may lead to different force curves for the same system. Fig 3 shows three stretching force curves for three different simulations of gluten. Each curve has a number of force peaks. In order to determine the characteristic largest force, F max , we take 5 largest force points and calculate their mean value. These mean F max are also indicated in Fig 3 as dotted lines surrounded by strips that represent their standard deviation. The F max used in Fig 2 is a weighted average of those values (from simulations with different random seeds), where the weights are calculated from the standard deviation of the mean. Large glutenins experience the largest, 5-fold increase in F max after oscillations, while for gliadins there is even a very slight decrease. The rise in F max for gluten is substantially bigger than for corn and rice proteins.
The last stage of the simulations includes moving the Z walls apart with a constant speed. The response force may be easily integrated over distance, giving the work required to stretch the system. The maximum work W max required to stretch the system is also bigger if oscillations were introduced before. Again, the difference is largest for glutenins, this time followed by maize proteins. Sometimes the force drops to near zero after reaching the maximum (meaning that the system was ruptured) and the work stops rising (as seen for the dark blue curves on Fig 3). There are other cases where a substantial mechanical work is still required for elongating the system after reaching F max (cyan curve), but the total mechanical work W max is correlated with the height of the F max peak. The proteins apply some effective pressure on the  denominator, marked by~). The properties are the average number of entanglements l k , the maximum force F max during the elongation in the last simulation stage and the maximum work W max required to elongate the system in that stage, the number of inter-chain contacts n inter and RMSF (root mean square fluctuation) averaged over all residues. The ratio of 1 is marked by a broken line. Each color corresponds to a different system, as displayed at the top, and defined in Section 1 of the S1 Text. The system density is 3.5 nm −3 . https://doi.org/10.1371/journal.pcbi.1008840.g002

PLOS COMPUTATIONAL BIOLOGY
Viscoelastic properties of wheat gluten in a molecular dynamics study walls, however it is much smaller than the mechanical stress resulting from the stretching (the atmospheric pressure would exert 0.01 nN of force on the walls).
Even if entanglements cause sharp peaks in the force curves, the resistance against deformation is also controlled by the number of inter-chain contacts, n inter , that have to be ruptured during elongation. The biggest increase in n inter is observed for maize proteins. They have a lower number of contacts in total (as indicated by their coordination number z shown in Table 1), but their residue composition allows them to make more contacts due to the mixing effect of the oscillating deformations.
The oscillatory mixing effect is general and it operates in each of the systems studied. It results in a stronger network of proteins that are connected together more tightly, which results in the lowering of RMSF. Less mobile proteins may be more sturdy, but, as a trade-off, the maximum force may occur for a lower inter-wall distance s [51].

The molecular explanation of gluten elasticity
According to the "loops and trains" picture, the response of gluten to deformation consists of two stages: at the first stage the chains elongate and the cavities between them shrink, which requires smaller force than disentangling the chains and moving them alongside each other that takes place in the second stage [14,52]. We tested these predictions by measuring how 6 different quantities change during the final extension of the simulation box (see Fig 4). The snapshots shown in this figure indicate that the "trains" do form, and that shearing between the chains drawn in red and blue may lead to a substantial force by disrupting and reforming the contacts between the chains. Indeed the number of inter-chain contacts decreases in the beginning of the elongation, but then it increases again as "trains" are formed. The chains become more rod-like and less globular, as indicated by the distortion parameter w [53] (averaged over all chains) that is equal to 0 for an ideal sphere, and 1 for an ideal rod. It is defined as 1 is the smallest radius of inertia, R 3 -the largest and R 2 the intermediate. Formation of a "train" (two parallel chain fragments connected by hydrogen bonds) does not coincide with the maximum of the force, because F max occurs near the beginning of the elongation and is associated with the biggest decrease in the number of contacts. Nevertheless, the force does not drop to zero and the work required to stretch the system steadily increases, because the chains from the opposite ends of the box are still connected and cause further resistance against deformation.
In order to quantify the number and size of cavities that form in gluten, we used the Spaceball algorithm [15,16], which uses a probe with a user-defined radius to determine a threedimensional contour of the system. The volume of the cavities is computed by filling a grid with those probes and rotating the grid with respect to the system. Our system is quite large, so we used grid with 0.1 nm lattice constant, a probe with radius 0.38 nm and one rotation of the grid. The snapshots generated show that the biggest detected cavity is dispersed over the whole simulation box, but during the elongation it becomes associated with the set of proteins attached to the Z = 0 wall (for the trajectory shown). As they are more and more elongated, the size of the cavity decreases. Larger cavities become divided into many smaller cavities, which is reflected in the increase in n c .
Fig 4 illustrates just one simulation (the same properties for the same system, but with a different random seed, are shown in S6 Fig), but the volume of the biggest cavity indeed decreases after elongation in most cases (see S4 Fig). The chains become more rod-like not only during the final stretching, but also after the oscillations (see S2 Fig): deformations cause the proteins to be briefly stretched, but after the deformation ends, the proteins do not return to the exactly the same shape as before, but stay elongated in one direction (which may be called a memory effect).

Determining the dynamic shear modulus
One of the few fundamental (independent of the size and shape of the sample) rheological properties of gluten is the dynamic shear modulus G � = G 0 + iG 00 , measured by deforming the gluten sample with sinusoidal deformation (shear strain for G � and normal strain for E � ) [41]. The real part (G 0 , elastic modulus) represents the in-phase part of stress response, the imaginary part (G 00 , loss modulus) the out-of-phase response (phase difference δ is defined as tan d ¼ G 00 G 0 ). This is a non-destructive method, because the amplitude of oscillations is small (usually up to a few percent [2]). The total force F(t) that the system under a shear deformation exerts on the walls in the X direction (see S1 Fig), divided by the surface area of a wall S, defines shear stress ϕ(t) = F X (t)/S (the usual symbol for shear stress, τ, is a time unit in our model, so we use ϕ instead). A strain defined as γ = γ 0 cos(ωt) is expected to induce a stress in a similar form, shifted in phase by δ: ϕ(t) = ϕ 0 cos(ωt + δ). Dynamic shear modulus G � = G 0 + iG 00 is then defined as [54]: where ϕ 0 and γ 0 are the amplitudes of stress and strain accordingly. We define it by fitting to functions s 0 (t) = A cos(ωt) and F(t) = F 0 cos(ωt + δ). Here δ is the phase difference, γ 0 = A/s 0 and ϕ 0 = F 0 /S. This is a very basic method of determining the shear modulus and more advanced methods can be used [55]. However, this method can be directly compared to experiment, where a gluten sample is also put under strain that changes periodically: a rheometer consists of two plates that move with respect to each other, and the sample is placed between them [41]. The plates may be flat or have a different geometry: we recreate the first ("plateplate") case (the walls in the Z direction represent the moving plates). The movement pattern in both cases is the same and represents the sinusoidal strain. The relative amplitude of the oscillations is also similar [2]. The rheometer records the force required to move the plates [41], whereas in the simulations the forces are computed directly. Movement in the X direction is equivalent to a measurement in a shear rheometer (that measures G � ), whereas walls moving in the Z direction reflect experiments in an extensional rheometer (that is usually used for larger and more destructive deformations, like those described in the previous subsections [41]). In Table 2 and Fig 5 we compare the results with experiment. Table 2 shows that gliadins are responsible for the viscous properties of gluten (big δ), and glutenins for the elastic part (small δ). Despite largely different values of the results, this trend is still visible in our simulations for the longest oscillation period (70 μs): tan δ sim is biggest for gliadins and smallest for glutenins.
We use frequencies that are 4 orders of magnitude larger than in experiment, but some extrapolations can be deduced from the experimental frequency spectra of gluten's G � . We  used the results from two experimental papers [56,57] that use hydrated gluten samples. Both predict that tan δ should be rising with frequency-see the bottom panel of Fig 5, where tan δ sim is between the extrapolations predicted by [56] (green curve) and [57] (purple curve). Such large discrepancies are the effects of different gluten compositions, that greatly influence the results (we use the Aubaine dataset from [57]). Growing δ means that G 0 and G 00 converge, as seen in the top panel of Fig 5. Both G 0 and G 00 are rising with f, however this rise is insufficient to explain why the values from simulation are 3 orders of magnitude larger. Either there is a significant non-linearity for higher frequencies and the extrapolation no longer holds, or this is a limitation of our model. We use implicit solvent which cannot account for differences in gluten's hydration that may greatly affect the results (even, "dry" gluten is slightly hydrated) [56].
On the other hand, the results of the simulation are in good agreement with the experiments done on gluten-based bioplastics (as shown on the top panel of Fig 5), where gluten proteins are mixed with glycerol and melted at over 380 K [58]. Even for bioplastics, the tensile strength (the stress needed to break the deformed system) is less than 1 MPa [58], while the F max divided by the wall area (which is around 100 nm 2 ) can be more than 10 times higher (see Fig 3).

Discussion and conclusions
We have used our coarse-grained model with the implicit solvent to provide microscopic-level understanding of gluten and the related proteins. Kneading and mixing gluten samples is known to increase their tensile strength [59]. In our simulation, we mimick these actions by introducing periodic deformations. We have found that as a result of these deformations gluten and glutenins became significantly more resistant to stretching, as indicated by the rise in F max and W max . Our model is also able to correctly distinguish gluten from storage proteins from other plants, and to tell the difference between durable glutenins and viscous gliadins. This difference appears to come from the combined effect of entanglements and inter-chain contacts (that represent mainly hydrogen bonds between glutamines in the case of gluten). Maize proteins formed more inter-chain contacts than rice proteins (which may justify why maize flour is more useful for making gluten-free doughs [60]). Its chains were too short to gain additional mechanical resistance from being entangled.
The importance of entanglements and inter-chain hydrogen bonding is already wellknown in the experimental literature [7,8], but our simulations provide an additional confirmation of that. We also elucidated microscopic mechanisms of the loops-and-trains picture [14]. We have found that the decrease in the volume of the biggest cavity is accompanied by the rise in the number of the inter-chain contacts during stretching.
Due to the small system size only a few inter-chain disulfide bonds were simultaneously present during our simulations (most of them were observed for maize proteins) and they did not seem to influence the results significantly. It is possible that they may become more important at larger length scales, where covalent linkage of the chains affects properties like solubility [3]. However, for about 20 gluten chains, the entanglements and non-covalent contacts are sufficient to explain elasticity of the system.
We used the periodic deformation to calculate the dynamic Young modulus G � . We have observed that the theoretical frequency dependence shown in Fig 5 agrees with the experimental results for glycerol-plasticized gluten [58] better than with those for unrefined hydrated gluten [56,57]. In our implicit solvent model glycerol may theoretically replace water as the solvent. However, our model was parameterized with the assumption of a watery solvent [9]. It should be noted, however, that the experimental range of the frequencies is orders of magnitude smaller than implemented here. Further simplifications are needed to access the longer time scales.
Another reason for the very high G � may be the coarse-graining itself: a coarse-grained pseudoatom has smaller volume than an all-atom amino acid. This may lead to diffrent elastic behavior, as was observed previously in the case of virus elasticity [61].
Despite the deciding role of gluten in determining the dough quality, its measured G � values are not strongly correlated with it [2]. During baking, the gas bubbles inside the dough extend it far beyond the limit of linear deformations. Therefore, measurements of extension to over 200% of the initial size are more informative [2]. Such extensions were generated in the last stage of our simulations and were discussed above. Unfortunately, properties like W max and F max depend on the system size and geometry, so they cannot be directly compared to experimental values.
Another hypothesis that can be checked in our simulation is that gluten proteins can form a β-spiral from consecutive β-turns made by repetitive fragments like QPGQ [13] (such fragments are common especially in glutenins [4]). Scanning tunneling microscopy discovered a groove with 1.5 nm periodicity in glutenins [4]. However, the sample preparation procedures used prevented the study from validating the existence of the β-spiral in bulk gluten. Models of single glutenin structures consisting of repetitive β [62] and γ [12] turns were reported in the literature, however we did not observe similar structures in our simulations.
We do not discuss here how gluten proteins interact with the human body, but this widely disputed subject [63,64] can also be tackled by our model in the future. Simulations can also check what part of gluten proteins is solvent-accessible, and thus available for digestive enzymes in the stomach, which could shed new light on the gluten intolerance problem [65]. Period of oscillations is 70 μs (lower periods did not result in such significant changes in w). The ratio of 1 is marked by a broken line. Each color corresponds to a different system, as displayed at the top, and defined in Section 1 of the S1 Text. The system density is 3.5 nm  denominator, marked by~). The properties are the average number of entanglements l k , the maximum force F max during the elongation in the last simulation stage and the maximum work W max required to elongate the system in that stage, the number of inter-chain contacts n inter and RMSF (root mean square fluctuation) averaged over all residues. The ratio of 1 is marked by a broken line. Each color corresponds to a different system, as displayed at the top, and defined in Section 1 of the S1 Text. The system density is 3.5 nm −3 . (TIF) S4 Fig. A ratio showing a decrease in the volume of the biggest cavity during the last stage of the simulations (elongation). The V c 0 max is the average taken over the first half of the elongation process, V c 1 max is the average taken over the last half. The ratio of 1 is marked by a broken line. Each color corresponds to a different system, as displayed at the top, and defined in Section 1 of the S1 Text. The system density is 3.5 nm −3 . (TIF)