Identifying essential factors for energy-efficient walking control across a wide range of velocities in reflex-based musculoskeletal systems

Humans can generate and sustain a wide range of walking velocities while optimizing their energy efficiency. Understanding the intricate mechanisms governing human walking will contribute to the engineering applications such as energy-efficient biped robots and walking assistive devices. Reflex-based control mechanisms, which generate motor patterns in response to sensory feedback, have shown promise in generating human-like walking in musculoskeletal models. However, the precise regulation of velocity remains a major challenge. This limitation makes it difficult to identify the essential reflex circuits for energy-efficient walking. To explore the reflex control mechanism and gain a better understanding of its energy-efficient maintenance mechanism, we extend the reflex-based control system to enable controlled walking velocities based on target speeds. We developed a novel performance-weighted least squares (PWLS) method to design a parameter modulator that optimizes walking efficiency while maintaining target velocity for the reflex-based bipedal system. We have successfully generated walking gaits from 0.7 to 1.6 m/s in a two-dimensional musculoskeletal model based on an input target velocity in the simulation environment. Our detailed analysis of the parameter modulator in a reflex-based system revealed two key reflex circuits that have a significant impact on energy efficiency. Furthermore, this finding was confirmed to be not influenced by setting parameters, i.e., leg length, sensory time delay, and weight coefficients in the objective cost function. These findings provide a powerful tool for exploring the neural bases of locomotion control while shedding light on the intricate mechanisms underlying human walking and hold significant potential for practical engineering applications.

One of the most crucial aspects of human walking is the energy-efficient maintenance of our controlled velocity in the range of 1.0-1.6 m/s [18].Central pattern generators in the central nervous system engage rhythmic neural circuits that generate basic leg movements [19][20][21][22][23]. Equally indispensable are reflex control mechanisms, which provide rapid adjustments to external stimuli from the sensory organs [19][20][21]24], even for unexpected balance loss or unpredicted ground variations during walking, thereby maintaining velocity.Notably, humans modulate the reflex response depending on the task in locomotion [25,26].These findings indicate that reflex mechanisms have a crucial impact on achieving stable and energy-efficient walking.
A physical simulation is a powerful tool for exploring the neural basis of walking, a complex phenomenon generated by the interaction of versatile mechanisms [27].Previous studies have shown that reflex-based control systems can generate human-like walking in terms of muscle activities, joint angles, and torque patterns [28][29][30][31][32][33][34][35][36][37].Notwithstanding the great advancements thereof, these systems have difficulty regulating velocity, owing to the large number of control parameters that must be properly tuned [38], e.g., Geyer [28] used 36 control parameters to generate a steady gait in a two-dimensional musculoskeletal model.Previous studies have attempted to regulate velocities within the reflex-based control frameworks [30,32], but their methods were limited to transitions between predetermined velocities and did not provide precise velocity controls.Furthermore, the transitions between these predetermined velocities were accomplished through the utilization of distinct control parameters designated for these transitions.Given that walking speed assumes continuous values, achieving precise control over walking speed through such means would theoretically require infinite parameters for transitions.Because "transition" differs from "control" in terms of its adaptability, it is essential to extend reflex-based systems to affect precise velocity controls while improving energy efficiency to explore reflex control mechanisms in walking.Furthermore, the development of energy-efficient control over a wide range of velocities in the reflex-based system will lead to improving the performance of the controller for exoskeletons [39,40] and prosthetic leg [17,41] by adjusting their control parameters according to user walking velocity.
The purpose of this study is to extend the reflex-based control system to enable controlled walking velocities based on target speeds and gain a better understanding of its modulation and energy-efficient maintenance mechanism across a wide range of walking velocities.To achieve this, we developed a novel performance-weighted least squares (PWLS) method to design a parameter modulator that coordinates a vast number of control parameters for an input target velocity while maintaining energy efficiency.In short, the reflex-based control system with the parameter modulator optimized via the PWLS successfully and energetic-efficiently maintains the desired velocity from 0.7 to 1.6 m/s in a two-dimensional musculoskeletal model.Subsequently, the detailed analysis of the parameter modulator in the reflex-based system identified two key reflex circuits affecting energy efficiency across a wide range of walking velocities.The contributions of this work include (i) extending a reflex-based control system to include velocity control, (ii) providing an adaptive polynomial regression method that uses performance indices to control performance, and (iii) identifying the key reflex circuits related to energy efficiency across a wide range of walking velocities.

Musculoskeletal model
A two-dimensional musculoskeletal model with a height of 180 cm and weight of 80 kg was employed in this study, as illustrated in Fig 1A .Detailed parameters are listed in Table 1.The model was constructed within a MuJoCo simulation environment [42].We employed Mujoco because of its ability to conduct simulations at a relatively low computational cost while ensuring the validity of the physical calculations.Many physics engines such as OpenSim, which is often used in biomechanics research, models contact with the ground by using a springdamper model.This approach requires using smaller timesteps to prevent the model foot from penetrating the ground, which increases computational cost.MuJoCo introduces several formulations of the physics of contact.This allows for more efficient calculations and versatile contact behaviors.As recent studies [43,44] have aimed to transfer OpenSim models to MuJoCo to improve computational speed and use versatile contact behaviors, MuJoCo is an acceptable physics engine with physical validity involving contact.The model consists of 12 segments: torso, hip, two thighs, two knees, two shanks, two feet, and two sets of toes, based on previous studies [28,29].The motions of the model are constrained to the sagittal plane.The hip and knee segments possess no mass.The model has nine internal degrees of freedom, as illustrated in Fig 1B; a torso joint, hip joints, knee joints, ankle joints, and toe joints.The spring and damper constants for all joints apart from the toe joints are set to 0 Nm/rad and 1 Nms/ rad, respectively.The toe joints are set with a spring constant of 30 Nm/rad, which is consistent with [29], and a damping constant of 0 Nms/rad.The spring and damper within the joint generate a torque that is proportional to the distance from the equilibrium position and friction that is proportional to the joint's angular velocity, respectively.We set the sliding friction coefficient between the feet and the ground at a relatively high value, 2.5.Then, sliding is unlikely to happen.Fig 1A illustrates the locations of the muscle actuators that generate torque in the joints.Each leg has eight muscle actuators reflecting the gluteal (GLU), hip flexor (HFL), vasti (VAS), tibialis anterior (TA), soleus (SOL), hamstring (HAM), rectus femoris (RF), and gastrocnemius (GAS) musculature.

Muscle actuator model
The musculoskeletal model includes actuators that simulate biological muscle actions that produce torque in the joints [42,45].The muscle actuator model comprises an inelastic tendon with a rest length of l 0 and a muscle that contracts the tendon.The muscle actuator force, F, is a function of muscle length l, velocity v, and current activation level a 2 [0, 1.0].A higher activation level results in a greater force being produced by the muscle.For the computation, l and v are scaled by the equilibrium length, l 0 , as follows: The actuator force, F, is computed as follows: where F l and F v represent the force-length and force-velocity relationships, respectively.F p represents the passive force that is always present, regardless of activation, and F 0 denotes the maximum isometric force that takes different values for each muscle actuator, as listed in Table 2.These values are consistent with those of previous studies [28,29].The computations of F l , F v , and F p are detailed in S1 Appendix.Briefly, F l is a function that attains a maximum value at l = l 0 , F v is a function that returns a smaller value for faster contraction of the muscle actuator, and F p increases monotonically for l.The muscle current activation level, a, is calculated for the input stimulation signal, u 2 [0, 1.0]: where tðu; aÞ ¼ 0:01 � ð0:5 þ 1:5aÞ ðu > aÞ 0:04=ð0:5 þ 1:5aÞ ðu � aÞ

Reflex-based control
The reflex-based controller employed in this study is identical in principle to that of Wang et al [29].This controller is based on the Geyer model and incorporates additional control laws to adjust the target hip joint angle and to fix the hip and knee joints prepared for the heel strike.Consequently, it more robustly produces gaits and can manage non-steady states during transitions in walking velocities.Moreover, the extended model can maintain the biomechanical explanatory ability essential to discussing gait adaptation mechanisms, which may be lost through simplification.We did not make any ad-hoc adjustments to the basic reflex-based control model to ensure optimal simulation performance.The controller computes the muscle stimulation, denoted as u i , for each muscle, i, by incorporating sensory feedback with a time delay of Δt as input.The control law switches depending on whether the leg is in the stance or swing phase.Moreover, additional stimulation is introduced during the late stance and late swing phases.The control system comprises three primary functions: force feedback, length feedback, and muscle-driven proportional derivative (PD) control.
• Force feedback: The force feedback law returns the stimulation signal, u F i , in response to the actuator force, F i .In humans, the signals of the F i arise from Golgi tendon organs and are carried by type Ib afferents to the spinal cord.u F i is calculated as follows: where the gain, G i , is the positive control parameter, and Fi ðt À Dt i Þ is the scaled actuator force, (i.e., Fi ¼ F i =F 0 i ), which includes the sensory time delay, Δt i .

• Length feedback
Through length feedback, we calculate the stimulation signal, denoted as u l i , corresponding to the length of the muscle actuator, l i .This function models the stretch reflex of the muscle spindle.u l i is calculated as follows: where the target length, l tar i , and gain, G i , are positive control parameters.

• Muscle-driven PD control
The muscle-driven PD control generates the stimulation necessary to move joint θ j (Fig 1B ) to the target angle, y tar j .This function can be interpreted as a polysynaptic reflex that is mediated by the joint position afferent input from group III fibers and descending signals of the target joint angles from the supraspinal.The muscle-driven PD control laws are employed to stabilize the generated gait by the hip muscles during the stance phase to balance the torso and during stance preparation to prepare for ground contact (described in S1 Appendix).
For muscle actuator, i, which applies torque to joint θ j in the positive direction, u y j i is defined as follows: Conversely, for muscle actuator i, which applies torque in the negative direction, u y i is defined as follows: where the proportional gain, K i , and derivative gain, D i , are positive control parameters.We added a parameter modulator that allows a wide range of walking while maintaining energy efficiency to the previous model [29].The reflex-based controller in the spinal layer activates muscles.The parameter modulator in the supraspinal layer coordinates the control parameter set Y tar to achieve the input target velocity, v tar vel .Each control parameter, y tar i 2 Y tar , is calculated using the function P i (v x ) in the parameter modulator for the input target velocity, v tar x (Fig 4).P i (v x ) is derived via the polynomial regression of the relationship between the velocity and parameter value, which is collected using optimization by incrementally increasing/decreasing the target velocity.SIMBICON in the supraspinal cord adjusts the desired foot placements.

Optimizing control parameters
The dataset for polynomial regression is collected via optimization by incrementally changing the target velocity.In total, there are 56 control parameters, Y 2 R 56 .Details are provided in S1 Appendix.We optimize these parameters using the covariance matrix adaptation evolution + and − denote positive and negative feedback, respectively.In the stance phase, F + at GLU, VAS, and SOL generate compliant leg behavior.L + at TA prevents overextension of the ankle joint, which is suppressed by the F − from the SOL.F + at GAS contributes to push-off and prevents overextension of the knee joint.Muscle-driven PD controls at HFL, GLU, and HAM balance the torso.In the swing phase, L + facilitates leg swing, which is suppressed by L − from HAM. F + at GLU and HAM apply braking force to the swing leg.L + at TA raises the toes to create clearance between the feet and the ground.strategy (CMA-ES) algorithm [46], which is well-suited for nonlinear and non-convex optimization problems.In this algorithm, independent λ search points are sampled from a multivariate normal distribution, N , at each generation, g: where Y represents the reflex control parameter set in this study, m (g) represents the mean value of the search distribution at generation g, σ (g) represents the step size at generation g, and C (g) represents the covariance matrix at generation g.The sampled control parameter set, Y, is evaluated using the objective function, f.Then, using the top μ data points from λ offspring along with the evolution paths, p ðgÞ s and p ðgÞ C , which accumulate historical search directions, m, σ, C, p σ , and p C are updated.
The evaluation of Y is conducted from a fixed initial state until time step T 0 , in which the musculoskeletal model falls, with an upper limit T.More specifically, if the model falls before reaching an upper time step T, the evaluation is terminated at that time step T 0 , and else T 0 = T.We judge that the model has fallen when any segment above the knee comes into contact with the ground.By terminating the unnecessary evaluations prior to reaching the maximum time step, the computation time can be reduced.Optimization is conducted to determine the optimal solution that minimizes the objective cost function.To generate an energy-efficient gait that follows the target velocity, the objective cost function, f, is designed as follows based on previous studies [29,30]: where s t represents the state of the model at time step t as determined by the control parameter set, Y, r represents the reward function for state s t , α E represents the weight coefficient, and CoT represents the cost of transport [47].Reward function r is defined as follows: where r alive prevents the model from falling.If the model does not fall over long timesteps, this term can reduce the total cost.r forward represents the penalty for the difference between the current horizontal walking speed, v x , and the target velocity, v tar x , with a lower limit of -1.r torso is established to maintain an upright torso.r fall incurs a large cost if the model falls within 700 timesteps of the initial state.This prevents convergences at local minima, where the model falls forward immediately from the initial position at the target velocity.α E (CoT − 0.3) in f represents the energy cost added at the end of a trial.0.3 is the deviation, which in turn permits a larger weight coefficient on CoT.The CoT quantifies the energy efficiency of locomotion, and a lower value indicates better energy efficiency [47].The CoT is expressed by the following equation: where J represents the total metabolic energy, M represents the model mass, g represents the acceleration of gravity, and Δd represents the distance traveled.J is calculated by summing the total metabolic energy expended by all muscles, as described in previous studies [29,48].The detailed equations for J are provided in S1 Appendix.

Dataset collection for polynomial regression
We run two programs in parallel to collect the data efficiently.Within one thread, initially, the target velocity, v tar x , in cost function f (Eq (12)) is set to 1.3 m/s, which is the human selfselected speed [29].Control parameter set, Y, that generate a gait around v tar x are obtained.Then, v tar x is slightly increased to v tar x þ Dv x , and the corresponding control parameter set around the updated target velocity are collected.This process is repeated until v tar x reaches the upper limit of the target velocity, v tar x max (See S1 Appendix in detail).Within the other thread, the target velocity, v tar x is initially set to 1.2 m/s.Then, v tar x is slightly decreased to v tar x À Dv x and this process is also repeated until v tar x reaches the lower limit of the target velocity, v tar x min .The initial control parameters are set identically in both programs.
The control parameters, Y, are optimized using the G generation for each target velocity, v tar x .When v tar x is changed, σ in Eq (11) is reset to σ 0 and p σ , and the p C are emptied, whereas m and C are maintained.If model-walking is maintained at the upper timestep limit of the evaluation trial with control parameters Y, a tuple consisting of walking speed, control parameters, and CoT {(v x , Y, CoT)} is added to the dataset.Notably, the dataset with n data points, {

PWLS (Performance Weighted Least Square method)
Each control parameter, y tar i 2 Y tar , is modulated through an mth degree polynomial function, P i (v x ) (Fig 4), for the input velocity, v x , as follows: where ω ij are coefficients calculated using our proposed PWLS, a polynomial regression algorithm that minimizes the total squared performance-weighted error of data points.Unlike the normal least-squares method in which data points are treated without bias when calculating the total squared error, PWLS derives the polynomial function from dataset with bias.Thus, it assigns a greater weight to higher-performing data points to reinforce energy-efficient walking via regression.PWLS is similar to the weighted least-squares method [49].The weighted least square method is the polynomial regression algorithm that is used when handling heteroscedastic data, meaning that the variance among the measured points is not constant.The variables that cause the variance of observations are incorporated to weigh each data point.In our PWLS, each data point is weighted according to its performance instead of the factor that causes heteroscedastic.Suppose there are n data points, {(v x1 , y i1 ), . .., (v xn , y in )}, for each control parameter from the collected dataset, {(v x1 , Y 1 , CoT 1 ), . .., (v xn , Y n , CoT n )}.The total error between y i and the expected value derived from P i can be expressed in matrix form as where k�k 2 represents Euclidean norm and . .
where � denotes the Hadamard product, which takes two matrices of the same size and returns a matrix in which each element is the product of the original elements (see S1 Appendix).
In PWLS, the ω ij coefficients are determined so that they minimize the total performanceweighted squared error, E PWLS .Given that the error for each data point is weighted by its evaluated performance, β i , the errors are more suppressed for high-performing data points and permissible for low-performing data points.E PWLS is the square of Eq (25), The partial derivative of E PWLS with respect to ω i is given by where B is an n × (m + 1) matrix that expands β along the horizontal axis and is defined as follows: 2 R n�ðmþ1Þ : ð30Þ E PWLS attains its minimum value when Thus, the objective of PWLS is to solve Eq (31) with respect to ω i .Combining Eqs ( 29) and (31) yields Using the property, (U V) T = V T U T , Eq (33) can be rewritten as Here, β � V ω i in Eq (36) can be reformulated (see S1 Appendix) as follows: By substituting Eq (37) into Eq (36), we obtain Consequently, the polynomial coefficient, ω i , in P i can be computed as follows: Design of weight parameters β β i 2 β is the performance evaluated in the PWLS to weigh corresponding data i.In this study, we evaluate the performance of each data point in terms of energy efficiency using the CoT.
We design β such that it has a higher value when the CoT is low.Because the energy consumed in human walking depends on speed [50] In this study, M was set to 125, giving 2M + 1 = 251, which is approximately 0.5% of the control parameter sets collected in the dataset.Using the average CoT value around the ith data point, CoT i , we defined the weight for data i, β i , as follows: where A � 1 denotes a constant.β i > 1 indicates that the data are evaluated as favorable data and β i < 1 as unfavorable data.For example, when CoT i < CoT i (i.e., generated gait was more energy-efficient), 1 À

CoT j
CoT j � � is calculated above 0, and this yielded β i > 1.A smaller CoTi value in comparison to the average CoT value, CoTi, results in the exponential increase of β i based on the A value.Therefore, the parameter A determines the strength of the bias toward higher-performing data points, i.e., a larger A value indicates a greater bias toward favorable data.

Simple application example of PWLS
In this section, we explain how PWLS computes polynomials, based on a simple illustrative example.We assume that there is a dataset of a control parameter y i , comprising a total of 18 × 9 data points, each evaluated through the CoT as depicted in Fig 5 .The horizontal axis indicates the generated walking velocity and the vertical axis indicates the control parameter value.The black-colored data points correspond to a lower CoT value of 0.5 (more efficient), while the grey-colored data points correspond to a higher CoT value of 1.0 (less efficient).Fig 5 illustrates polynomials calculated through PWLS.It can be found that the polynomials pass close to the data points with highly evaluated data points with larger A, which determines the strength of the bias toward highly-evaluated data points (Eq (41)).Thus, by assigning a greater weight to higher-performing data points, walking energy efficiency can be reinforced via regression.

Results
In the optimization, the upper timestep for each evaluation trial was set to T = 5, 000 steps (25 s).The weight coefficients in the objective cost function, f, were set as α E = 5000, α v = 5 and α t = 1.0.We collected the dataset by running two programs in parallel with v tar x incrementally increase/decrease in 0.1 m/s intervals.Within one thread, v tar x was initially set to 1.3 m/s and increased to v tar x max ¼ 2:0 m/s.Within the other thread, v tar x was initially set to 1.2 m/s and decreased to v tar x min ¼ 0:4 m/s.For each v tar x , λ = 20 points were sampled per generation, with a generation number of G = 300, which we confirmed that optimization was converged (See S1 Appendix).This resulted in 102,000 evaluation trials.The other hyperparameters for CMA-ES were set to σ 0 = 0.1 and μ = 7. λ and μ are recommended value [46].σ 0 was set to be large, but not out of the solution space in the optimization process.Data collection required approximately three days on a Lenovo ThinkPad E470 20H2S04L00.Out of all trials, we added approximately 50% to the dataset.Fig 6 illustrates the generated walking speeds and the CoTs in a dataset, indicating a concave quadratic relationship similar to human walking [50].Using PWLS, we obtained P i (Fig 4).We found that at least a polynomial degree of m = 6 is required to generate gaits.Therefore, to compute polynomials with minimal computational cost and to avoid overfitting, we chose a polynomial degree of m = 6.

Generated gaits
To stabilize walking before setting v tar , we controlled the musculoskeletal model with parameters that produced a stable gait at 1.25 m/s for a distance of 2.0 m from the initial position.We found that steady walking was generated for v tar x ¼ 0:75 À 1:6 m=s by using the optimized functions derived with A = 1 − 10 6 in Eq (41) (see S1 Video).Here, We defined steady walking as being able to walk more than 30 m without falling.The actual speeds were approximately to v tar x (see S1 Appendix).Fig 7 displays snapshots of the generated gaits for v tar x at 0.9 (slow), 1.25 (normal), and 1.6 (fast) m/s.Fig 8 illustrates the ground reaction forces (GRF) and kinematics of the generated gait using the optimized functions derived with A = 10 6 and those of humans at corresponding speeds [51].Although undershoots and overshoots were observed, the cross-correlation values (R) between the GRF of the generated and human gaits were consistently positive.In kinematics, the cross-correlation values for hip and knee joints were close to 1, whereas the value for the ankle joint was nearly 0.

Performance evaluation of PWLS
In this section, we evaluate the optimization performance of the proposed polynomial regression (PWLS) method.PWLS calculates regression curves by weighting higher-performing (i.e., energy-efficient) data points, in which A determines the strength of the bias toward them.(41).To quantify the performance of the PWLS, we defined

Identifying factors essential for improving energy efficiency
In this study, we aim to identify the key reflex circuits that influence energy-efficient walking across a wide range of velocities.First, to identify the essential factors for improved energy efficiency, we substituted the optimized function of each control parameter that derived at A = 10 6 , which achieved more energy-efficient walking over a wide range of velocities, in the gait with A = 1.From the results, we found that G HFL , l tar HAM and y tar t are the three parameters that changed Δ| R CoT| value by more than 0.5%.This result suggests that reflex circuits that involve these parameters may have a significant impact on energy-efficient walking across a wide range of velocities.The two lengthfeedback reflex circuits, G HFL ðl HFL À l tar HFL Þ and G HAM HFL ðl HAM À l tar HAM HFL Þ, is the primary circuits involved in these parameters, G HFL and l tar HAM : both reflex circuits stimulate HFL muscle during the swing phase (Fig 13).y tar t , which is the reference angle of the torso, changed Δ| R CoT| value of the third.This parameter was not related to a specific reflex circuit but was shared with several reflex circuits.

Key reflex circuits for energy-efficient walking over wide range velocity
The aforementioned results suggest that G HFL ðl HFL À l tar HFL Þ and G HAM HFL ðl HAM À l tar HAM HFL Þ could be the parts of the essential reflex circuits on the wide-range velocity walking with maintaining energy efficiency.Therefore, we proceeded to investigate how modulating these reflex circuits affects the energy efficiency of the gait.We measured the change in R CoT values when the optimized functions of the control parameters associated with G HFL ðl HFL À l tar HFL Þ (reflex circuit 1) and G HAM HFL ðl HAM À l tar HAM HFL Þ (reflex circuit 2) were replaced with those derived with A = 10 6 in the generated gait with A = 1 (a baseline), similar to  ) and reflex circuit 2 (i.e.G HAM_HFL and l tar HAM H FL ) were replaced with those derived with A = 10 6 in the generated gait with A = 1, respectively.The pink bar represents when the optimized function of the control parameters associated with both reflex circuits (i.e., G HFL , l tar HFL , G HAM_HFL , l tar HAM H FL ) were replaced.The purple bar indicates the value in the generated gait using the optimized functions derived with A = 10 6 .When circuit 2 was modulated, some generated gaits did not follow the specific input v tar vel , with their velocities being changed by more than 0.05 m/s.Data acquired from the target velocity that resulted in these outliers were excluded from the calculation of estimated CoT curves.https://doi.org/10.1371/journal.pcbi.1011771.g014parameters for the case of A = 10 6 shown in the purple bar.This finding strongly suggests that the two length-feedback reflex circuits, G HFL ðl HFL À l tar HFL Þ and G HAM HFL ðl HAM À l tar HAM HFL Þ, are the key reflex circuits of the energy efficiency in the reflex-based bipedal walking control.
To elucidate the mechanism behind the improved energy efficiency, we examined the change in energy consumption by individual muscles, muscle activation patterns, and stimulation signal for the corresponding muscle when we modulated the two identified reflex circuits.Fig 15 shows the energy consumed by individual muscles during a 30 m walk at v tar x ¼ 1:0.We found that energy consumption of the HFL, GLU, and HAM muscles was significantly reduced after reflex circuits 1 and 2 were modulated.Fig 16 shows the muscle activations of the bipedal model for v tar x = 1.0 m/s.HFL muscle was mainly activated in the swing phase while GLU and HAM muscles were activated in the stance phase.Fig 17 illustrates the stimulation signal applied to HFL muscle during the swing phase before (blue line) and after the modulation (pink line) of reflex circuits 1 and 2: the stimulation signal applied to HFL muscle decreased after the modulation of these identified reflex circuits.

Robustness to parameter changes
Under various parameter conditions, leg length, sensory time delay, and weight coefficients for the cost function, we have validated the reliability and robustness of the identified two reflex circuits concerning their impact on energy-efficient walking over a wide range of velocities:

Discussion
This study aimed to extend the reflex-based control system including velocity control and identify key reflex circuits that play a significant role in energy-efficient walking.We demonstrated that a musculoskeletal model driven by reflex-based control can achieve controlled speeds based on the input target velocity.Subsequently, by utilizing the generated gaits of varying energy efficiencies, we identified two key reflex circuits having a significant impact on CoT values.
We found that the proposed PWLS regression method optimizes the parameter modulator for the reflex-based controller to reproduce more energy-efficient walking across a wide range of velocities.We verified that the R CoT values were decreased by giving the bias to the highperforming data in calculating regression curves as shown in Fig 10 .These results demonstrate that the PWLS fitting method is more adaptive in evaluating each data point as a weight for performance.However, we also found that the walking generated at an excessively large A value resulted in less stability, resulting in a fall at specific target velocities.This can be attributed to insufficiently fast leg swing.Putting the swing leg fast enough in front of the stance leg is essential to prevent falling down [52].As illustrated in Fig 17, the stimulation applied to the HFL during the swing phase decreased with larger A. Consequently, this led to a slow hip flexion resulting in less stability.
As shown in Fig 14, we identified that G HFL ðl HFL À l tar HFL Þ and G HAM HFL ðl HAM À l tar HAM HFL Þ significantly contributed to improve the energy efficiency of the gait generated through reflexbased control.Furthermore, this conclusion has been robustly validated across various settings, including a different body structure, different neural systems, and different cost functions, as shown in Fig 18 .Our findings elucidated that the modulation of these reflex circuits resulted in a reduction of stimulation to the HFL during the swing phase, subsequently leading to reduced energy consumption by the HFL, GLU, and HAM (Fig 15).HFL expended less energy due to reduced stimulation.GLU and HAM were mainly activated during the stance phase (Fig 16), and these two muscles are used to maintain the torso balance in the stance phase [28].Reduced HFL activity during the swing phase leads to less energy consumption at the contralateral GLU and HAM because the work to compensate for torso acceleration caused by HFL is decreased.Therefore, it can be concluded that minimizing the stimulation to the HFL during the swing phase while ensuring sufficient hip swing to prevent falls is essential to improve energy efficiency in reflex-based walking control.This not only reduces HFL activity but also the effort of the GLU and HAM, which compensates for torso acceleration caused by HFL.
The generated walking has several characteristics that demonstrate its biomechanical validity in comparison to the human gait.(i) the R values of the hip and knee kinematics were close to 1 (R > 0.94 in all cases); (ii) The trajectory of the hip segment height from the ground exhibited a sinusoidal pattern (see Fig. S7 in S1 Appendix) [53]; (iii) we observed both heel strike and toe-off events [54] (see attached movie file); and (iv) a quadratic relationship between walking velocities and CoT [50] ( Fig 10).These features are similar to those of human walking from a biomechanical perspective, proving that our simulations do not compromise the dynamically reasonable properties of human walking.Although we agree that reducing the number of control parameters would be essential and helpful for understanding the mechanism underlying walking, we used Wang's extended model rather than Geyer's original model to robustly generate walking without losing biomechanical explanatory ability.
The results of this paper have some limitations concerning similarity to human walking.First, overshoots and undershoots were observed in the measured GRFs (Fig 8) and time evolution of the walking velocities (Fig 9).In the GRF profiles, we found two undershoots in GRF x and two overshoots in GRF z .The first occurred at heel strike, and the second occurred at toe-off of the contralateral leg.These overshoots and undershoots can be suppressed by using a lower-impedance ground.However, employing a lower-impedance ground resulted in foot penetration into the ground (refer to S1 Appendix).While the extreme GRF peaks are not biomechanically meaningful, we strongly believe that it is difficult to accurately model contact in a simulation environment [55,56].Moreover, we did not observe abrupt or significant changes in joint kinematics attributable to these extreme GRF peaks.Thus, given the difficulty of modeling contact in the simulation environment, the extreme peaks in GRF do not significantly affect kinematics.We conclude that the effects of the extreme peaks of GRF on walking in the musculoskeletal model do not negatively affect the biomechanical meaning of the findings obtained in this study.Second, the cross-correlation values of ankle dorsiflexion were close to 0 compared to hip and knee joints, as shown in Fig 8 .At knee joints, the model straightened the knee earlier in the stance phase compared to humans.This strategy is known to generate more efficient solutions in gait optimization [32].The previous study also showed low cross-correlation values at ankle joints when optimized to minimize energetic cost [32].Third, we designed the cost function employed in this study to have minimal task terms to generate a human-like gait with more weight on the energy efficiency-related term.This approach optimized the control parameters to improve energy efficiency, consequently allowing the identification of factors that are significant to the energy efficiency of the generated gait over a wide range of walking velocities.More complex cost functions [29,34,36] may be required to generate human-like ankle joint kinematics.Finally, although this study successfully demonstrated the implementation of velocity control in a reflex-based control system, it took 20 s to transit walking speed by 1.0 m/s without falling.On the other hand, humans can adjust their walking speed by 1.0 m/s in less than 2 s [57].Consequently, our control framework lacks the component to change walking speeds rapidly while ensuring stability.Extending the proposed framework is one of the future works to achieve velocity control as fast as humans.
Human locomotion involves complex interactions between descending supraspinal commands, interconnected spinal circuits involving reflexes and CPGs, and the musculoskeletal system.This study primarily focuses on reflexes in the spinal cord.Therefore, we must add other control components for fast walking velocity control.Moreover, we constrained the bipedal model motion into the sagittal plane.Hence, the extension of the motion into three dimensions is another essential future work to apply the controller to engineering applications.

Conclusion
Reflex mechanisms contribute significantly to the generation of stable and energy-efficient walking.However, a major limitation of generating gaits in musculoskeletal models through reflex-based control is the difficulty in precisely regulating velocity due to the large number of control parameters that need to be properly tuned.Extending reflex-based systems to affect velocity controls is essential to explore the reflex modulation mechanism and to understand its energy-efficient maintenance mechanism across a wide range of velocities.Furthermore, the development of energy-efficient control over a wide range of velocities in the reflex-based system will facilitate advanced engineering applications.Therefore, we developed a reflex-based control framework that enables the regulation of walking velocity over a wide range of velocities.Our parameter modulation method using PWLS that calculates the control parameters in response to a target velocity while optimizing efficiency successfully demonstrates generating walking gaits from 0.7 to 1.6 m/s.Furthermore, after a detailed analysis of the parameter modulator in a reflex-based system, we identified that the modulations of two reflex circuits, G HFL ðl HFL À l tar HFL Þ and G HAM HFL ðl HAM À l tar HAM HFL Þ, improve energy efficiency of the gait.The coordinated activity in the swing phase between the HFL and HAM reduced the stimulation applied to HFL during the swing phase, which not only caused the reduction of HFL activity but also alleviated the effort of GLU and HAM that compensates for the torso acceleration induced by the HFL.This research will inspire future investigations into reflex mechanisms and facilitate the development of advanced walking control systems for practical applications, such as gait-assisted exoskeletons and prosthetic legs, and robot control.

Fig 1 .
Fig 1. Musculoskeletal model employed in this study.A. Structure of the musculoskeletal model.Left: Oblique view of the bipedal model in the MuJoCo simulator.Center: Side view of the model with segment parameters.Right: Muscle alignments.B. Internal degrees of freedom of the model.θ t , θ h , and θ k represent torso joint angle, hip joint angle, and knee joint angle, respectively.The red arrows depict the positive direction of joint rotation.https://doi.org/10.1371/journal.pcbi.1011771.g001

Fig 2
Fig 2 illustrates the reflexes in the stance phase and swing phase.Please see S1 Appendix or the original paper [29] for more details.

Fig 3
Fig 3 displays  the control diagram of the musculoskeletal model, which is structured into spinal and supraspinal layers.We added a parameter modulator that allows a wide range of walking while maintaining energy efficiency to the previous model[29].The reflex-based controller in the spinal layer activates muscles.The parameter modulator in the supraspinal layer coordinates the control parameter set Y tar to achieve the input target velocity, v tar vel .Each control parameter, y tar i 2 Y tar , is calculated using the function P i (v x ) in the parameter modulator for the input target velocity, v tar x (Fig4).P i (v x ) is derived via the polynomial regression of the relationship between the velocity and parameter value, which is collected using optimization by incrementally increasing/decreasing the target velocity.SIMBICON in the supraspinal cord adjusts the desired foot placements.

Fig 2 .
Fig 2. Key reflexes in the stance and swing phase.F represents force feedback, L represents length feedback, and PD represents muscle-driven PD control.+and − denote positive and negative feedback, respectively.In the stance phase, F + at GLU, VAS, and SOL generate compliant leg behavior.L + at TA prevents overextension of the ankle joint, which is suppressed by the F − from the SOL.F + at GAS contributes to push-off and prevents overextension of the knee joint.Muscle-driven PD controls at HFL, GLU, and HAM balance the torso.In the swing phase, L + facilitates leg swing, which is suppressed by L − from HAM. F + at GLU and HAM apply braking force to the swing leg.L + at TA raises the toes to create clearance between the feet and the ground.

Fig 3 .
Fig 3. Control diagram for the musculoskeletal model.The musculoskeletal model is driven by activated muscles receiving stimulation signals, u, from the reflex-based controller in the spinal layer.Each leg (right and left) is controlled by separate stance and swing reflexes.The control law switches depending on whether the leg is in contact with the ground.Sensory information from the model is fed back to the controller for motion generation.The supraspinal layer controls the walking velocity and adjusts the desired foot placements.The parameter modulator coordinates the control parameter set Y tar to achieve input target velocity, v tar vel .SIMBICON adjusts the target hip angle, y tar h , in the stance preparation phase using the sensory information (d and v x , see S1 Appendix).https://doi.org/10.1371/journal.pcbi.1011771.g003

Fig 4 .
Fig 4. Parameter modulator.This is located in the supraspinal layer of the controller (Fig 3).By using the polynomial function, P i , each control parameter, y i , is adjusted to achieve the input target velocity, v tar x .https://doi.org/10.1371/journal.pcbi.1011771.g004

Fig 5 .
Fig 5.The regression curve derived through PWLS with different A values.Upper: The prepared dataset comprises 18 × 9 data points, each is evaluated through the CoT.The black-colored data points represent a lower CoT value of 0.5 (efficient), while the gray-colored data points represent a higher CoT value of 1.0 (inefficient).Lower: Calculated regression curves, with a polynomial degree set to 6. https://doi.org/10.1371/journal.pcbi.1011771.g005

Fig 9 .
Fig 9.The regression curve derived through PWLS with different A values.Upper: The prepared dataset comprises 18 × 9 data points, each is evaluated through the CoT.The black-colored data points represent a lower CoT value of 0.5 (efficient), while the gray-colored data points represent a higher CoT value of 1.0 (inefficient).Lower: Calculated regression curves, with a polynomial degree set to 6. https://doi.org/10.1371/journal.pcbi.1011771.g009

Fig 10 .Fig 11 .
Fig 10.Contribution of the PWLS to gait generation.Left: Estimated CoT curves for generated gaits with different A values.Right: Relative R CoT value of the generated gaits with different A values (n = 3).The generated gait with A = 1 is the baseline.R CoT is the integral of the estimated CoT curves from the lower limit velocity (=0.7 m/s) to the upper limit velocity (=1.6 m/s).https://doi.org/10.1371/journal.pcbi.1011771.g010

Fig 12 .Fig 13 .
Fig 12. Absolute change in R CoT values when the optimized function of each parameter was substituted to that derived with A = 10 6 in the generated gait with A = 1 (n = 3).The red bars represent control parameters associated with the reflex circuits under focus.https://doi.org/10.1371/journal.pcbi.1011771.g012 Fig 11, and the result is shown in Fig 14.The cyan and olive bar represents the relative R CoT value when we replaced only the optimized functions associated with reflex circuits 1 and 2, respectively.The pink bar represents the relative R CoT value when we replaced the optimized function of both reflex circuits.The results indicated that the modulation of only these two reflex circuits resulted in a comparable or even more substantial reduction in R CoT value compared to the modulation of all 56

Fig 14 .
Fig 14.Relative R CoT values to the generated walking with A = 1 when modulating identified reflex circuits (n = 3).The cyan and olive bar represents the relative R CoT value when only the optimized function of control parameters associated with reflex circuit 1 (i.e.G HFL and l tar HFL) and reflex circuit 2 (i.e.G HAM_HFL and l tar HAM H FL ) were replaced with those derived with A = 10 6 in the generated gait with A = 1, respectively.The pink bar represents when the optimized function of the control parameters associated with both reflex circuits (i.e., G HFL , l tar HFL , G HAM_HFL , l tar HAM H FL ) were replaced.The purple bar indicates the value in the generated gait using the optimized functions derived with A = 10 6 .When circuit 2 was modulated, some generated gaits did not follow the specific input v tar vel , with their velocities being changed by more than 0.05 m/s.Data acquired from the target velocity that resulted in these outliers were excluded from the calculation of estimated CoT curves.

•
Fig 18 illustrates the relative RCoT values to the baseline (A = 1) under various conditions.We found that modulating reflex circuits 1 and 2 resulted in decreased R CoT values, comparable to the result for all 56 parameter modulation (generated gaits can be seen in S1 Video).

Fig 15 .Fig 16 .
Fig 15.Energy consumed by individual muscles compared to the generated gait with A = 1 during a 30 m walk.v tar x was set to 1.0 m/s.The pink bars represent the values when the optimized functions of the control parameters associated with reflex circuits 1 and 2 are replaced with those derived with A = 10 6 in the gait generated with A = 1, and the purple bars represent the values in the gait with A = 10 6 .The negative value indicates that the energy consumption at the muscle was reduced compared to the generated gait with A = 1, while the positive value indicates increased energy consumption.https://doi.org/10.1371/journal.pcbi.1011771.g015

Fig 17 .
Fig 17.Stimulation to the HFL in the swing phase before and after modulating reflex circuit 1&2.v tar x was set to 1.0 m/s.The blue and pink lines are the stimulation to the HFL, which does not drop below 0, at generated gait with A = 1 and after the optimized function of the control parameters associated with reflex circuit 1 and 2 are replaced with those derived with A = 10 6 , respectively.https://doi.org/10.1371/journal.pcbi.1011771.g017

Fig 18 .
Fig 18.Relative R CoT values to the generated walking with A = 1 under different setting parameters (n = 2).In a different body structure setting, the segment lengths of the model were shortened by 20%.In a different neural system setting, the time delay of the sensory information to the controller was doubled.In this setting, stable gaits were not generated for A > 100.In a different cost function setting, the weight coefficients in the objective function (Eq(31)) were changed to α E = 2500, α v = 5, and α t = 0. https://doi.org/10.1371/journal.pcbi.1011771.g018

Table 2 . Individual F 0 .
, it is unreasonable to compare CoT values between data with widely different speeds.Thus, we calculate β relative to the average CoT measured around the generated walking speed.Because the dataset, {(v x1 , Y 1 , CoT 1 ), . .., (v xi , Y i , CoT i ), . .., (v xn , Y n , CoT n )}, is sorted by speed, the average CoT around the jth data point, CoT j , is calculated as follows: