A bicycle can be balanced by stochastic optimal feedback control but only with accurate speed estimates

Balancing a bicycle is typical for the balance control humans perform as a part of a whole range of behaviors (walking, running, skating, skiing, etc.). This paper presents a general model of balance control and applies it to the balancing of a bicycle. Balance control has both a physics (mechanics) and a neurobiological component. The physics component pertains to the laws that govern the movements of the rider and his bicycle, and the neurobiological component pertains to the mechanisms via which the central nervous system (CNS) uses these laws for balance control. This paper presents a computational model of this neurobiological component, based on the theory of stochastic optimal feedback control (OFC). The central concept in this model is a computational system, implemented in the CNS, that controls a mechanical system outside the CNS. This computational system uses an internal model to calculate optimal control actions as specified by the theory of stochastic OFC. For the computational model to be plausible, it must be robust to at least two inevitable inaccuracies: (1) model parameters that the CNS learns slowly from interactions with the CNS-attached body and bicycle (i.e., the internal noise covariance matrices), and (2) model parameters that depend on unreliable sensory input (i.e., movement speed). By means of simulations, I demonstrate that this model can balance a bicycle under realistic conditions and is robust to inaccuracies in the learned sensorimotor noise characteristics. However, the model is not robust to inaccuracies in the movement speed estimates. This has important implications for the plausibility of stochastic OFC as a model for motor control.


Introduction
Keeping balance is an important function for many organisms. With this function, the organism controls one body axis relative to gravity, and it achieves this by keeping the body's center of gravity (CoG) above its area of support (AoS). In this paper, I will focus on balancing a bicycle.
However, much of what I will say also holds for other forms of balancing that involve a human body: walking, running, skating, skiing, etc. For example, all forms of balancing a human body involve the same two basic actions for keeping the body's CoG above its AoS: (1) changing the AoS while keeping the CoG fixed (e.g., by stepping out with one leg), and (2) shifting the CoG while keeping the AoS fixed (e.g., by leaning the upper body). Keeping balance during walking is an active field of research, and the recent review paper by Bruijn and Van Dieën (1) gives a good overview.
Keeping balance is a sensorimotor control problem: the central nervous system (CNS) receives sensory information about the body, the bodyattached tools (bicycle, skates, skis, …), and their environment (turn radius, speed, …), and uses this information for calculating actions with which it controls the position of body and tools relative to gravity. The dominant model for sensorimotor control assumes that the CNS makes use of an internal model to determine these control actions [2,3]. In some publications [4,5], a distinction is made between forward and inverse internal models, but here I will only consider forward models; the inverse model will be denoted as the feedback control law. The (forward) internal model simulates the dynamics of the plant (body plus body-attached tools) it attempts to control.
A very influential version of this model claims that this control is optimal in the sense that it minimizes a cost functional that depends on movement precision (here, deviation from the upright position) and energetic costs [2,3]. This model is called optimal feedback control (OFC), and in this paper I will apply the model's stochastic version to bicycle balance control; the deterministic version would predict that the CoG stays exactly above the AoS once this position is reached, which is unrealistic.
To evaluate the plausibility of stochastic OFC as a model for bicycle balance control, one must address at least the following questions: (1) Is the model good enough to balance a bicycle under realistic conditions (lean and steering angles that are observed with real riders), and (2) Is the model robust to inaccuracies in the model parameters? The relevance of robustness follows from the fact that the model parameters must allow for an accurate simulation of the plant dynamics. However, in some inevitable cases (e.g., in the beginning of a learning process), the parameter values cannot be very close to their optimal values, and therefore the model must have some minimal robustness to inaccurate parameter values. Of course, the stabilization performance (indexed by, e.g., lean angle variability) may decrease with parameter inaccuracy, but for a realistic range of values (see Results), the bicycle and rider should not fall over.
It is useful to distinguish different types of inaccuracies with respect to the time it takes to reduce them. On the one extreme, there are inaccuracies that are reduced between (instead of within) cycling trips. These inaccuracies pertain to slowly varying characteristics of the plant (gain factors, moment arms, sensorimotor noise characteristics, …) that the CNS must learn from experience.
At the other extreme, there are the inaccuracies in the state variables (i.e., the variables of the equations of motion). In this paper, all state variables are related to limb configurations and gravity (steering angle, upper-and lower body angle) about which the CNS obtains information via the somatosensory (including proprioception) and the vestibular system.
Inaccuracies in the CNS-estimated state variables are reduced on a timescale that is set by the delays in these sensory systems, which are around 100 ms. [6]. Although I will not investigate this in the present paper, some minimal robustness is required to inaccuracies in these estimates. Fortunately, there is good evidence from psychophysical studies that, in healthy humans, the CNS obtains reliable sensory information about the body's orientation relative to gravity: for body orientations near the vertical axis, the noise standard deviation of the CNS's estimate is approximately 4 degrees [7].
In between these two extreme time scales (slowly varying plant characteristics and state variables), there is an intermediate time scale that is characteristic for parameters such as movement speed. According to the literature, movement speed estimates depend on optical flow [8]. However, these estimates are very unreliable, as is clear from its Weber fraction (the smallest step increase in forward optical flow velocity necessary for the difference to be perceived): to perceive an increase within 500 ms. the increase had to be at least 50% [9]. Therefore, a plausible model for bicycle balance control must have some minimal robustness to inaccuracies in movement speed estimates.
In the remainder of this introduction, I will first describe the mechanical aspects of bicycle balance control, and next how bicycle balance control can be formulated as a stochastic optimal control problem. In the Results section, I will first introduce a model of sensorimotor control that is based on the idea that a mechanical system (plant) is both controlled and learned by a computational system that uses an internal model to calculate optimal control actions. Next, in a simulation study, I will evaluate (1) whether this model is good enough to balance a bicycle under realistic conditions, and (2) whether it is robust to inaccuracies in the values of two parameters of the computational system, sensorimotor noise characteristics (slowly varying) and movement speed (intermediate time scale).

Problem definition
A standing human is balanced when his center of gravity (CoG) is above his base of support (BoS), which is formed by the soles of his two feet plus the area in between. Balance follows from the fact that the gravitational force (a vector quantity in 3D passing through the CoG) intersects this BoS. The situation is similar but not identical for a bicycle. A stationary bicycle (i.e., a bicycle in a track stand) is balanced when the combined CoG of rider and bicycle is above the one-dimensional line of support (LoS), the line that connects the contact points of the two wheels with the road surface. In this position, the direction of the gravitational force intersects the LoS. However, because of disturbances, the CoG cannot be exactly above this one-dimensional LoS for a finite period. Therefore, a bicycle is considered balanced if the CoG fluctuates around the LoS within a limited range, small enough to prevent the bicycle from touching the road surface.
Compared to a stationary bicycle, the balance of a moving bicycle is more complicated because, besides gravity, also the centrifugal force acts on the CoG. Crucially, the centrifugal force is under the rider's control via the turn radius [10]. The balance of a moving bicycle depends on the resultant of all forces that act on the CoG: a bicycle is balanced if the direction of this resultant force fluctuates around the LoS within a fixed range. Besides the forces that act on the CoG, there are also forces that are responsible for the turning of the bicycle's front frame, and some of these do not depend on the rider [11]. These latter forces are responsible for the bicycle's self-stability and will be discussed later (see Bicycle self-stability).  In green, the bicycle rear frame, characterized by its lean angle ! over the roll axis (green arrow). In red, the bicycle front frame, characterized by its angle over the steering axis (red arrow). In blue, the rider's upper body, characterized by its lean angle " over the roll axis (blue arrow). In black, (1) the steering torque # and the lean torque $ ! , which are both applied by the rider, and (2) the steering axis angle , which is set equal to 90 degrees for the purposes of the present paper (see text). (B) Rear view. In green, the bicycle rear frame (plus lower body) lean angle ! (which is equals the front frame lean angle). In blue, the rider's upper body lean angle " . The symbol ⨂ denotes the CoG of the upper body (in blue), the lower body (in green), and the combined CoG (in black).

The geometry of the rider-bicycle combination
I assume that the rider sits on the saddle and keeps his feet resting on the non-moving pedals. In this position, the rider's lower body (the hips/pelvis and below) is firmly supported and can be considered a part of the rear frame. This simplification implies that leg movements are not used for balance control. However, the stochastic version of this bicycle model (see, Sensorimotor noise and stochastic OFC) allows for pedaling-related movements to be included as motor noise and, in this way, to produce associations between lean-and steering angles. In sum, the only forces that can be used for balance control are (1) a steering torque # on the handlebars, and (2) a lean torque $ ! at the hinge between the rider's upper body and the rear frame, corresponding to the hips.

Cycling involves a double balance problem
Cycling involves a double balance problem, of which I have described the first part: keeping the combined CoG of the rider and the bicycle above the LoS. The second balance problem pertains to the rider only: keeping his CoG above his BoS. In the bicycle model in Figure 1, the rider is represented by his upper body, of which the CoG must be kept above the saddle. The balance problem with respect to the rider is further simplified by only considering the balance over the roll axis (parallel to the LoS), which corresponds to upper body movements to the left and the right. I thus ignore the balance over the pitch axis (perpendicular to the LoS and gravity), which corresponds to upper body movements to the front and the back, typically caused by accelerations and braking. With this simplification, the joint between the rider's upper body and the rear frame is a hinge with a single degree of freedom.

Balance control strategies from a mechanical point of view
For keeping the combined CoG over the LoS (the first balance problem), the relevant control actions must result in a torque over the LoS (roll axis). Within the constraints of our kinematic model, there are two ways for a rider to perform a control action: (1) by turning the handlebars, and (2) by leaning the upper body. To explain these control actions, it is convenient to make use of Figure 1B. This is the schematic of a double compound pendulum, of which the dynamics depend on how it is actuated: (1) if the contact between the green rod and the road surface is controlled by a linear force, the dynamics is known as the "double compound pendulum on a cart" [12], and (2) if the angle between the green and the blue rod (the upper body lean angle) is controlled by a torque at this joint, the dynamics is known as the Acrobot [13].
We first take the perspective of a double compound pendulum on a cart. This involves that, by turning the handlebars, the contact point of the green rod (representing the combined front and rear frame) with the road surface moves to the right under the combined CoG. In fact, turning the handlebars changes the trajectory of the tire-road contact points and, because the CoG wants to continue in its pre-turn direction (by Newton's first law), this results in a centrifugal force in the bicycle reference frame (of which the LoS is one of the defining axes). This centrifugal force is perpendicular to the LoS and results in a torque over the roll axis in the direction opposite to the turn (a tipping out torque). This steering-induced tipping out torque can be used to move the combined CoG to the opposite side of the turn. Thus, steering in the direction of the lean produces a tipping out torque that brings the combined CoG over the LoS. This explains the name of this control mechanism: "steering to the lean".
We now take the perspective of the Acrobot, which involves that, by applying a lean torque at the hips, the lean angles of both body parts change. Consequently, the separate CoGs of both body parts are shifted, and this in turn affects the gravity-dependent torques on these body parts.
Crucially, a lean torque at the hips does not shift the combined CoG, and therefore cannot bring this combined CoG above the LoS in a direct way.
However, it can do so in an indirect way, namely by turning the front frame. This is essential for the mechanism via which a bicycle can be balanced when riding no handed. First, when leaning the upper body sufficiently to one side, the bicycle and the lower body lean to the other side. Next, depending on properties of the bicycle (wheel flop, the wheels' gyroscopic forces, the combined CoG [11,14]), leaning the bicycle to one side turns the front frame to the same side. This lean-induced turn of the front frame then initiates the same mechanism as when turning the front frame by means of the handlebars: a change in the trajectory of the tireroad contact points results in a centrifugal force perpendicular to the LoS, producing a torque over the roll axis in the direction opposite to the turn.
This lean torque brings the LoS under the CoG.
For the second balance problem (keeping the upper body's CoG over its BoS), the same two control actions can be used: (1) turning the handlebars, and (2) applying a lean torque at the hips. Turning the handlebars in the direction of the upper body lean produces a lean torque in the other direction (i.e., away from the lean), and this allows to control this upper body lean angle. By applying a lean torque at the hips, this upper body lean angle can be controlled in a more direct way, but at the expense of leaning the bicycle (and the lower body) in the opposite direction.
Because the two balance problems use the same control actions, coordination is required. For example, a torque at the hips can be used to counter the turning-induced centrifugal torque on the upper body: by applying a hip torque of equal magnitude as this centrifugal torque (but opposite direction), the position of the upper body can be controlled.
There exists an energy-efficient alternative for this upper body control strategy, well-known in motorcycle racing: leaning the upper body to the inside of the turn. When the upper body is sufficiently leaned to the inside of the turn, the resulting gravity-induced torque will counter the centrifugal torque on the upper body CoG.

Balancing and steering
When riding a bicycle, the rider typically does not only want to balance his bicycle, but also wants to steer it over a chosen/indicated trajectory. This paper only considers control actions for balancing the bicycle, and therefore will not consider constraints on the trajectory, such as obstacles and bicycle path edges. This pure balance task corresponds to cycling blindfolded on an empty parking lot. After a brief familiarization, most humans can cycle blindfolded on an empty parking lot; a search on social media will show several demonstrations of this.

Bicycle self-stability
At this point, it is necessary to mention the self-stability of the bicycle, the fact that, within some range of speeds, the bicycle is balanced without control actions by the rider [11]. Self-stability is investigated by modelling the rider as a mass that is rigidly attached to the rear frame and does not touch the front frame, allowing the handlebars to move freely. Selfstability depends on multiple factors, such as geometric trail, pneumatic trail, wheel flop, the wheels' gyroscopic forces, and the combined CoG [11,14]. These factors all contribute to the bicycle's tendency to steer in the direction of the lean.
The focus of the present paper is on the rider's contribution to bicycle stability, and therefore I used a model bicycle from which I removed all known factors that contribute to the bicycle's self-stability (see Figure 2).
Specifically, I removed the effects of pneumatic trail and the wheel's gyroscopic forces by replacing the wheels by ice skates (or, equivalently, tiny roller skate wheels). And I removed the effects of geometric trail and wheel flop by choosing a vertical steering axis (i.e., by setting l in Figure 1 to 90 degrees), as in most bicycles for BMX and artistic cycling. I also keep the CoG at approximately the same position as on a regular bicycle (i.e., 30 cm before the rear wheel contact point), because a CoG with a more anterior position may result in bicycle self-stability [14]. Without all these effects, the bicycle's front frame does not steer in the direction of the lean unless the rider turns the handlebars. Therefore, the balance control strategy for riding no handed that I described before, cannot be used on this model bicycle. In the Methods and Models section, I will describe how this simplified bicycle-rider combination can be modeled as a double pendulum of which the base can be moved by turning the front wheel/skate, and the joint at the hips can be actuated. This model will be called the "steered double pendulum" (SDP)

Linear and nonlinear bicycle models
The SDP is a nonlinear model. This nonlinearity a desirable property because the objective of the present paper is to demonstrate that a linear control mechanism can balance a nonlinear mechanical system. The most popular bicycle model is linear, and it was proposed by Meijaard, Papadopoulos (11) as a benchmark for studying the passive dynamics of a bicycle. Depending on the model parameters, this linear model is self-stable in some range of speeds.
For comparison with the nonlinear SDP without self-stabilizing forces, I also used a linear bicycle model with self-stabilizing forces (gyroscopic forces plus the forces that depend on geometric trail and wheel flop) to check the generality of some results obtained with the SDP. This linear model will be called the "benchmark double pendulum" (BDP), and it is a combination of an existing benchmark model [11] and the double pendulum.
Another nonlinear bicycle model has been proposed by Basu-Mandal, Chatterjee (15). Just like the linear benchmark model [11], this nonlinear model does not include an upper body. However, it is straightforward to produce the BDP by linearizing the double pendulum EoM and combining these with the linear benchmark model. Such an extension was much less straightforward starting from the nonlinear model by Basu-Mandal, Chatterjee (15). This motivates my choice for the SDP and the BDP.
Importantly, the mechanical details of the bicycle model are not crucial for this paper. It is my objective to demonstrate that a linear control mechanism can balance a whole range of bicycle models under realistic conditions. Because almost all mechanical systems are nonlinear, I will focus on the SDP, and use the linear BDP to investigate the generality of the SDP results.

Control and noise forces
For investigating balance control, one must distinguish between control and noise forces. Loosely formulated, control forces are the forces that the rider uses to balance the bicycle. For a more precise formulation, I use the optimal control framework, which defines control actions as the actions that optimize a quantitative performance index. Thus, control forces are the optimal forces for a given performance index.
Noise forces are the difference between the forces that are applied and the optimal control forces. It is useful to distinguish between (1) noise forces that originate from the rider, and (2) noise forces that originate from interactions of the bicycle with the environment (e.g., collisions, gusts of wind). In this paper, I only consider noise forces that originate from the rider. These noise forces affect the balance via the same contact points as the two control forces (the handlebars and the saddle). These noise forces are an important instrument in the simulations that I have run to investigate bicycle balance control: they distort the balance, and this allows to investigate different stabilizing (balance-restoring) mechanisms.
Balancing a bicycle as a stochastic optimal control problem Optimal feedback control Every motor task can be performed in an infinite number of ways, and this is for two reasons: (1) the human body has a very large number of joints that can be used in various combinations to produce the same trajectory of the relevant body part (the combined CoG in a balance task, an effector endpoint in reaching task, …), and (2) a motor task unfolds over time and can be performed with different speed profiles. Nevertheless, most motor tasks are performed in a highly stereotyped manner. For instance, reaching tasks consistently show roughly straight-line paths with bell-shaped speed profiles [3].
To explain these highly stereotyped actions among skilled performers, Todorov and colleagues [3,16] proposed optimal control theory. This theory uses a scalar cost functional that increases with time-integrated imprecisions and energetic costs. Optimal control involves that the control actions are chosen such that this cost functional is minimized. There are two versions of optimal control that differ with respect to how the sensory feedback is used: (1) a version that assumes a fixed planned trajectory and uses sensory feedback to correct for deviations from this planned trajectory [for an overview, see 2], and (2) a version without a planned trajectory in which the control actions are merely a function of the feedback [3,[16][17][18][19].
This second version is called optimal feedback control (OFC). I propose a model for bicycle balance control based on OFC. In line with OFC, I thus hypothesize that bicycle balance control does not involve trajectory planning. This hypothesis only applies to the specific task of balance control (as when cycling blindfolded on an empty parking lot) and not necessarily to other aspects of cycling, such as steering a bicycle over an indicated trajectory or an obstacle course. For these other aspects, it is likely that some form of trajectory planning is required.
In previous work, OFC has been mainly applied to reaching tasks [2, [16][17][18][19]. For such tasks, the overall precision predominantly depends on the precision at the endpoint of the reaching movement. In line with this fact, the cost functional is dominated by imprecisions (distances between the end effector and the reach target) near the endpoint [16]. In contrast, for tasks in which a state must be maintained over time, such as balancing a bicycle, the cost functional must depend on the imprecisions uniformly across the theoretically infinite lean angle trajectory.
For applying OFC, one needs the equations of motion (EoM) that describe the dynamics of the system (here, the rider-bicycle combination) as a set of differential equations. The variables of these differential equations are called state variables, and for my two bicycle models (SDP and BDP) they are the following: the steering angle , the rear frame lean angle ! , the upper body lean angle " (see Figure 2), plus their corresponding angular rates. In the Materials and Methods, I will derive the SDP EoM from Lagrangian mechanics, and the BDP EoM by linearizing the double pendulum EoM and combining these with the linear benchmark model.
The fact that the SDP EoM are nonlinear has important implications for the use of OFC for stabilization. Specifically, OFC does not provide general results for stabilizing a nonlinear system. However, it provides very useful results for stabilizing a linear system, and this has led to the common practice in robotics to linearize the nonlinear system, apply OFC for linear systems, and use the resulting optimal control signals to stabilize the nonlinear system [13]. I hypothesize that the CNS implements a similar solution for stabilizing a bicycle and the rider's upper body: build an internal linear approximation of the external nonlinear system that the CNS wants to stabilize and use calculations like those from OFC to achieve this. In a later section, Stabilizing a nonlinear mechanical system by linear stochastic OFC, I will describe this model in more detail.
OFC uses a scalar cost functional to define the optimal control actions. This is in line with the fact that the CNS implements functions for setting goals and evaluating actions. For our application to bicycle balance control, it is natural to define this cost functional as one that increases with (1) deviations between the CoGs (combined and upper body) and their respective support, and (2) the energetic costs of the control actions.
The control actions that result from the minimization of this cost functional are the steering and the lean torque.

Sensorimotor noise and stochastic OFC
Because riders and other biological systems suffer from sensor and motor noise [20], deterministic OFC is an unrealistic model for bicycle balance control. As a result of this noise, the CNS cannot perfectly know nor control the outside world, which includes the body that is attached to the CNS. Specifically, if the sensory feedback is noisy, the CNS cannot infer the state variables perfectly from this feedback. Also, the CNS is unaware of the motor noise that is generated at the muscular level, which is added after the CNS has produced the motor command. Therefore, even if the CNS were able to calculate an optimal motor command based on perfectly accurate state information, that command would not fully control the muscles.
Fortunately, for a system whose behavior depends on noise, optimal control is still defined, namely if the system is governed by linear stochastic differential equations (SDEs) with additive Gaussian noise and a quadratic cost functional. Under these conditions, control is optimal if it is based on an optimal state estimate [21]. The optimality of this state estimate is relative to the conditional probability distribution of the state estimate at time given the values of all variables on previous times. Therefore, this optimal estimate not only depends on the sensory feedback at time , but also on the optimal state estimate and the control action (actually, its efference copy) just before this time. This optimal estimate involves the familiar Kalman filter, which weights the sensory feedback in proportion to its reliability. Several empirical studies have suggested that state estimation in the CNS involves this type of weighting in proportion to the reliability of the available information [22][23][24].
The ability to correct for motor and sensor noise depends on the CNS's internal model of the dynamics of the plant and the sensory feedback. The CNS uses this internal model to estimate the current state from (1) the previous state, (2) the most recent control action, and (3) the sensory feedback. Several psychophysical [25][26][27] and neurophysiological [28,29] studies have provided evidence for such internal models. An internal model can be conceived as a set of differential equations that allows the CNS to simulate state variables and to combine this information with the sensory feedback to obtain an optimal state estimate.

The robustness of control based on an internal model
Because an internal model cannot be directly observed, its hypothesized role in sensorimotor control must be evaluated based on its performance.
This performance pertains to how well the optimal controls under a linear approximation can stabilize a nonlinear system. This linear approximation involves several parameters, such as the matrices that define the linear approximation to the nonlinear EoM and the noise covariance matrices (see Stabilizing a nonlinear mechanical system by linear stochastic OFC).
The larger the range of parameter and state values for which the internal model can stabilize the nonlinear system, the more robust the control, and the more likely that the CNS uses a similar internal model for sensorimotor control. In this paper, for two types of parameters, the learned sensorimotor noise characteristics, and movement speed, I will determine the range of values for which the internal model can stabilize the nonlinear bicycle model while producing realistic state values. From these values, it can be concluded that the model is robust to inaccuracies in the learned sensorimotor noise characteristics, but not to inaccuracies in the movement speed estimates.

A model for sensorimotor control
The EoM for most dynamical systems are nonlinear. This holds for the SDP model bicycle, but also for common movements such as reaching, throwing, and walking; these movements are all performed by changing joint angles, which results in EoM involving trigonometric functions. I denote the nonlinear EoM as follows: ̇= Ω( , ) The vector contains the state variables, and ̇ their first derivatives with respect to time. For the SDP, = 0 , ! , " ,, ! , " 1 % and = 0 # , $ ! 1 % (see Figure 2). In the Methods and Models section, I derive the SDP EoM from Lagrangian mechanics.
OFC calculates optimal control actions that minimize a cost functional In Figure 3, I have depicted a model for sensorimotor control that is based on an internal model that is a linear approximation of the unknown nonlinear dynamics Ω( , ). These nonlinear dynamics are depicted in red and will be denoted as the mechanical system. In its application to balancing a bicycle, this mechanical system corresponds to the rider's body plus his bicycle. In other applications, the mechanical system may also involve objects in the environment that are sensed from a distance using vision and/or audition. The mechanical system receives input from the motor output system (in black), which adds noise to the optimal control action . The sensory input system (in green) maps the state variables onto sensory variables (as specified by the matrix ), adds noise and feeds the resulting sensory input into the computational system (in blue). Figure 3: Sensorimotor control of a mechanical system (in red) by input from a computational system (in blue). The mechanical system is governed by the nonlinear differential equations ̇= Ω( , ), and the computational system produces an optimal control action . The motor output system (in black) adds noise to and feeds this into the mechanical system. The sensory input system (in green) maps the state variables to sensory variables, adds noise and feeds the resulting sensory input into the computational system. The computational system calculates an optimal internal state estimate ; by integrating a linear differential equation (characterized by the matrices , , , and the Kalman gain ) that takes the sensory feedback as input. The optimal control action is obtained from ; and the LQR gain − .
The computational system consists of two components: (1) the internal model, which calculates an optimal internal state estimate ; by integrating a linear differential equation (characterized by the matrices , , and the Kalman gain ) that takes the sensory feedback as input, and (2) the feedback control law, which determines the control action by multiplying the state estimate ; with the linear quadratic regulator (LQR) gain -(minus sign added for consistency with the existing literature).
The matrices , and must be learned from experience with the mechanical system. Useful reference values for and can be obtained from the first order Taylor approximation of the nonlinear Ω( , ) at the unstable fixed point = and = . That is, Ω( , ) can be linearly approximated by + , with and being the Jacobian of Φ( , ) at the unstable fixed point.

Motor and sensor noise
The stabilizing performance of the combined mechanical-computational system (i.e., how close stays to its target value) is adversely affected by motor and sensor noise: motor noise directly feeds into the mechanical system, and sensor noise degrades the internal state estimate. The model for the motor input to the mechanical system is a simple errors-in- and (") : Eq. 2 The scaling matrices Φ ! " ⁄ and Ξ ! " ⁄ determine the covariance of the motor noise = Φ ! " ⁄ (!) and the sensor noise = Ξ ! " ⁄ (") . Specifically, the motor and the sensor noise are normally distributed with covariance matrices Φ and Ξ , respectively.

Stochastic OFC deals with noise in an optimal way
Stochastic OFC provides the tools to deal with motor and sensor noise, and it does so in an optimal way if the noise is Gaussian and additive [21].
This optimality is central to the proposed model for sensorimotor control, which I now formulate with the detail that is required to simulate it on a computer: 1. The CNS learns from experience the following matrices: , , , and the covariances of the motor and the sensor noise. For the purposes of this paper, the matrix that maps onto is assumed to be known. The learned noise covariance matrices can be given plausible values, as I will describe in the Results section.
2. The control actions are produced by an internal model that is based on the following linear approximation of the other three systems:

Is the optimal model good enough?
For stochastic OFC to be a good model for bicycle balance control, the bicycle and the rider must remain balanced over a range of lean and steering angles that is observed. Importantly, the optimality of stochastic OFC does not automatically ensure that the model is also good enough in that respect [30]. Specifically, in this simulation study, I will evaluate whether stochastic OFC with optimal parameter values can balance the model bicycle for steering and lean angles that are observed with real riders on real bicycles, without requiring steering angular rates that no real rider can produce.
However, it is unlikely that the linear model's parameters are exactly at their optimal values, and the possible consequences of this are discussed next.

Which parameters are responsible for stabilization failures?
Stabilization may fail (i.e., bicycle and rider fall over) because of motor and sensor noise. However, stabilization also depends on the parameters of the computational system, and in this paper, I will investigate the role of a few of these parameters. The computational system is fully specified by the following seven matrices: , , , Σ, Ψ, and , and I will investigate the role of the following three: , Σ and Ψ.
It is useful to distinguish between the static and the dynamic parameters of the computational system: the dynamic parameters are the state variables , and the static parameters are the seven matrices on which these state variables depend. From a theoretical perspective, it is a matter of choice whether a parameter is considered static or dynamic. However, from an applied perspective (here, bicycle balance control), it is important to know the time scale over which the parameters are likely to change.
This is related to the robustness of the computational system: if the system is not robust to inaccuracies in some dynamic parameter, then the organism needs a mechanism to correct these inaccuracies. If this mechanism is slow (more than a day), it is usually called "learning" (offline updating), and if it is fast, it is usually called "sensory feedback" (online updating). For the model considered here, the CNS must learn the internal noise covariance matrices Σ and Ψ from experience with the mechanical system and the motor and sensor noise. The required learning rate is set by the robustness of the computational system: the more robust the computational system to inaccuracies in Σ and Ψ, the slower the learning rate may be.
The matrix is also a dynamic parameter because it depends on the bicycle speed which changes over time: = ( ) = 3 ( )5. The time scale of the bicycle speed changes is in the order seconds (e.g., accelerating from 0 to 1.5 m/sec. takes about 1 sec.). Thus, the CNS probably needs online updates of the bicycle speed. Crucially, the robustness of the computational system to inaccuracies in the speed estimates becomes more important as these updates are less reliable. This is relevant here, because there is good psychophysical evidence against reliable speed estimates based on optical flow [9]. Thus, a plausible computational system must be robust to inaccurate speed estimates.
In sum, the CNS must learn and/or estimate some parameters of the computational system. Because this process takes time, the system's performance must be robust to inaccuracies in the system's parameters. I investigated this robustness in two simulation studies in which I manipulated the accuracy of (1) the learned noise covariance matrices Σ and Ψ, and (2) the system (state) matrix . These two parameter sets correspond to two different aspects of the environment that the CNS must learn: (1) the reliability of the motor output and the sensory input, and (2) the physical laws that govern the movements of our body and bicycle.
They also play different roles in the computational model: the learned noise covariance matrices only affect the Kalman gain (which updates the internal state estimate), whereas the learned system matrix also affects the LQR gain (which maps the state estimate on the control action).
How plausible is stochastic OFC as a model for sensorimotor control?  I next investigated whether the results for the SDP generalize to a linearized model with self-stabilizing forces, the BDP. The BDP is a combination of an existing benchmark model for studying the passive dynamics of a bicycle [11] and the double pendulum.
The simulations for the BDP were performed in the same way as for the SDP, and the results are shown in Figure 4B and 4D. Crucially, to obtain approximately the same percentage of skid trails in the BDP as in the SDP simulations, the noise amplitude had to be increased by a factor of approximately 14 (compare the x-axes of Fig. 4A and 4B). This shows that the BDP is much less susceptible to noise than the SDP. This is most likely due to the positive trail of the BDP, which is responsible for caster forces in the front frame. Caster forces reduce the impact of the noise on the handlebars because they align the front wheel with the rear frame [31].
Except for the reduced susceptibility to noise, the results for the BDP are like those for the SDP: even for the highest noise level, the RMS combined CoG lean angle is well below the upper bound for a comfortable lean angle    Figure 6. For the BDP, the model's performance was unaffected by the difference between the learned and the actual noise.
In sum, both for the SDP and the BDP, the stabilization is robust to inaccuracies over two orders of magnitude for the learned motor and sensor noise covariances. For the SDP, the stabilization is only robust to learned motor noise covariances that are too small and learned sensor noise covariances that are too large. For the BDP, this robustness is uniform.

Is the model robust to inaccuracies in the system matrix due to speed misestimation?
To investigate the robustness to inaccuracies in the system matrix I used the fact that the optimal system matrix ( For the SDP simulations, I set # = $ ! = . = 0.015, for which no skidding occurs when the optimal system matrix is used (see Fig. 4). The results in Figure 7 (panels A and C) show that SDP stabilization strongly depends on the accuracy of the speed estimate: successful trials were only found for speed fractions between 0.917 (51% completed) and 1 (100% completed). The robustness is asymmetrical around the true speed: there is a small range of underestimated speeds (fractions 0.9333 to 1) that allow for stabilization, but for overestimated speeds this range is much smaller (less than from 1 to 1.0167).
For the BDP simulations, I set # = $ ! = . = 0.1944, for which no skidding occurs when the optimal system matrix is used. The pattern of results for the BDP is like the one for the SDP (see Figure 7, panels B and D) but the range of speed fractions that allows for BDP stabilization is much wider than for the SDP: from 0.725 to 1.1625. The risk for stabilization failures is again at the high end of the speed estimates.
In sum, compared to the robustness to inaccuracies in the learned noise covariance matrices (over two orders of magnitude), the stabilization is much less robust to inaccuracies in the system matrix that result from misestimation of the bicycle speed. This holds for both bicycle models.

Discussion
I proposed and evaluated a model for sensorimotor control. The central concept in this model is a computational system, implemented in the CNS, that not only controls but also learns a mechanical system that exists outside the CNS. At the interface between these two systems, there is a motor output system that transfers a control signal to the mechanical system, and a sensory system that maps the state of the mechanical system into the computational system. The computational system can simulate the combined mechanical, motor output, and sensory input system. It does so by means of a learned approximation of (1) the physical laws that govern the mechanical system, (2) the mapping performed by conditions. In the second study, I demonstrate that the model's stabilization performance is robust to inaccuracies in the learned noise covariance matrices. For the SDP, this requires the qualification that the stabilization performance is robust to learned motor noise covariances that are too small and learned sensor noise covariances that are too large; for the BDP, the robustness is uniform across the noise levels. The third simulation study shows that, compared to the robustness to inaccuracies in the learned noise covariance matrices, the stabilization is much less robust to inaccuracies in the system matrix that result from misestimation of the bicycle speed.
All three simulation studies showed that the BDP is much less susceptible to noise than the SDP. This is probably due to the positive trail of the BDP, which generates caster forces in the front frame. These forces reduce the impact of the noise on the handlebars by aligning the disturbed front wheel with the non-disturbed rear frame [31].
As holds for every model, my model is only an approximation of reality. It is important to be aware of a few aspects for which I made a choice for the sake of computational feasibility or simplicity. Not all choices are inevitable, but more work is needed to extend the model, allowing it to perform all computations that are performed by the CNS. The first useful extension immediately follows from the third simulation study: if the model must apply to a wide range of speeds, a mechanism must be added for accurate speed estimation and selection of the appropriate system matrix. Because of its large Weber fraction [9], it is doubtful that optic flow is the only source of information for speed estimation. This is supported by the fact that many experienced cyclists can ride on stationary bicycle rollers.
The second aspect to be aware of, is that the computational system is based on a linear approximation of the unknown mechanical system.
Although it is difficult to argue against the idea that the internal model must be based on some sort of approximation, there is no reason that it should be linear and optimal for a single point (i.e., the unstable fixed point). For example, if it were the optimal linear approximation for the unstable fixed point, and the bicycle rider had learned the linear coefficients based on experience with lean angles below 5 degrees, then this linear approximation would also allow him to simulate the linear ODE in Eq. 3 for much larger lean angles than he is familiar with. This would allow him to balance his bicycle outside the range he is familiar with.
Whether this is possible, is still an empirical question.
The third aspect to be aware of pertains to the biological delays between the state estimate ; and (1) the mechanical system input (the motor delay), and (2) the sensory feedback (the sensory delay). The motor delay is caused by the fact that the control action must pass via motor axons and muscles before it affects the mechanical system, and the sensory delay is caused by the fact that the sensory feedback must pass via a series of sensory neurons before it arrives in the computational system. In our model, I assumed both delays to be zero, which is unrealistic. With respect to the motor delay, for a model that only estimates the current state ;( ), the following must hold: in which /01 is the motor delay. Even for a small motor noise and a state estimate ; that approximates the mechanical system states very well, the torque ( + /01 ) will not stabilize the mechanical system if ( + /01 ) differs too much from ( ). This is a well-known problem in sensorimotor control, and it has been proposed that the prediction of future states may solve it [32][33][34][35][36][37]. This implies that the estimate ;( ) is replaced by a prediction r( , /01 ), which extrapolates the estimate at time  It is possible to extend the model such that it incorporates the firing-rateto-torque conversion, and this requires knowledge of the muscular physiology. Specifically, if the optimal control action is a vector of firing rates, then one needs a new matrix that must be decomposable as follows: in which :← specifies the mapping from the firing rate vector on the joint torques , and ̇← (the old matrix ) specifies the mapping from the joint torques on the state derivatives ̇. The matrix ← must be specified based on knowledge of muscular physiology, and the matrix ̇← can be calculated as the Jacobian of Ω( , ) with respect to , evaluated at = .
The fifth aspect to be aware of is that the control actions are only twodimensional (steering and hip torque), whereas the number of balancerelevant muscles and joints is much larger. This simplification can be motivated by the fact that the relevant control input is strongly constrained by the geometry of the bicycle and the rider's position on it.
This simplification is specific to balancing a bicycle, and this points to the challenges one may encounter when extending the model to other forms of balance control (e.g., cycling while standing on the pedals, walking, running, skating, skiing). In principle, the extension is straightforward, as it only requires the EoM for this other form of balancing. However, the challenging part may be the derivation of the EoM, which starts by identifying the balance-relevant joints and selecting the ones that can be actuated. Once the EoM are derived, the linearization and the calculations for the computational system are identical to those for balancing the SDP.
The sixth aspect to be aware of is that the current sensory model is underspecified: it assumes that the sensory input is identical to the state variables (as implemented by the assumption that the matrix is the identity matrix) plus some noise. From sensory neurophysiology, it is known that information about the state variables (steering, lower body, and upper body angles and angular rates) must be obtained from the somatosensory and/or the vestibular system, but the details of that knowledge are not yet incorporated in the model.
The seventh aspect to be aware of pertains to the assumption that the motor and the sensor noise are additive, although there is good evidence that motor noise is multiplicative [38,39]. The advantage of additive over multiplicative noise, is that it is much easier to derive the optimal control actions. For multiplicative noise, optimal control actions were derived by Todorov and colleagues [3,16], but these are restricted to movements with a finite horizon (e.g., pointing, reaching, throwing, hitting). Keeping balance is an infinite horizon problem (i.e., the cost-to-go functional is an integral from zero to infinity), and this requires mathematical results for which a convenient computational implementation is not yet available [40,41].

Concluding, I have proposed and evaluated a model for sensorimotor
control that is based on the idea that a computational system in the CNS learns and controls an external mechanical system. This control is optimal in the sense of stochastic OFC. The model can balance a bicycle and its rider under realistic conditions and is robust to inaccuracies in the learned noise covariance matrices. It is not robust to inaccuracies in the learned system matrix caused by a misestimation of the speed. The model is a very useful starting point for investigations into human balance control, and there are several ways in which it can be extended to provide a more realistic account.

Methods and models
Equations of motion (EoM) for the steered double pendulum (SDP) The SDP is depicted schematically in Figure 8. The SDP contains ingredients of three familiar models: the double compound pendulum on a cart [DCPC,12], the Acrobot [13], and the torsional spring-mass-damper system. Roughly speaking, the SDP is a double compound pendulum of which the base can be steered by a wheel (instead of a cart) and the joint between the two rods (at the hips) can be actuated, as in the Acrobot.
Both actuated joints, one at the handlebars and one at the hips, are modeled as a torsional spring-mass-damper system. I will denote the lower and the upper rod as, respectively, the lower and the upper body. The lower body represents the rear frame plus the rider's lower body; the upper body only represents the rider's upper body. The inertial reference frame has an arbitrary origin, and the rider/bicyclecentered reference frame has its origin at the orthogonal projection of the combined CoG on the LoS. These reference frames have parallel coordinate axes. In green and blue, I depict the lean angles of the lower and the upper body ( ! and " ), and in red, I depict the yaw angle of the LoS. The horizontal plane (road surface) is colored light yellow.
The kinematic model Figure 8 depicts the relevant kinematic variables in both an inertial (yellow origin) and a rider/bicycle-centered (purple origin) reference frame.
The inertial reference frame has an arbitrary origin, a vertical coordinate axis V perpendicular to gravity, and an arbitrary horizontal coordinate axis H perpendicular to V. The rider/bicycle-centered reference frame has its origin at the orthogonal projection of the combined CoG on the LoS, and a vertical and horizontal coordinate axis V' and H' that are parallel to those of the inertial reference frame. The rider/bicycle-centered reference frame is non-inertial because, when the bicycle turns, the origin no longer moves in a straight line, and therefore accelerates in the inertial reference frame.
I will use the rider/bicycle-centered reference frame to define three For the SDP EoM, one must know the precise dependence of ̇ on .
Deriving this dependence is a well-known problem in vehicle dynamics [42], For steering angles −20 o < < 20 o , all deviations from linearity are less than 0.36%. I will continue to use this approximation. Following [42], one can obtain the centrifugal acceleration ( ) as follows: For a constant speed , the centrifugal acceleration is only a function of the steering angle .

The steering model
The steering model assumes that the steering angle is fully controlled by rider-applied forces on the handlebars. Thus, I ignore all forces that may contribute to the bicycle's self-stability.
The steering assembly consists of the front wheel, the fork, the handlebars, and the rider's arms. I model this assembly as a torsional spring-massdamper system: 2133;̈+ 2133;̇+ 2133; = # Eq. 8 In Eq. 8, 2133; is the assembly's rotational inertia, 2133; its damping, and 2133; its stiffness. The input to the steering assembly is the net torque produced by the rider's muscles and denoted by # on the right-hand side of Eq. 8.

The double compound pendulum with a steer-actuated base
The crucial difference between Eq. 9 and the corresponding equation for the DCPC is that ( ) replaces the acceleration of the cart. The constants in Eq. 9 are defined as follows: Eq. 10 The constants ! , ! , ! and ! are, respectively, the mass, the length, the CoG ( ! 2 ⁄ ) and the mass moment of inertia of the double pendulum's first rod, which represents the bicycle and the rider's lower body. The constants " , " , " and " are defined in the same way, but now for the second rod, EoM of the double compound pendulum are obtained from Eq. 9 by removing the terms that correspond to the centrifugal acceleration ( ), the stiffness and the damping: I evaluate these EoM at ! = " and replace sin( ) by its linear approximation near 0: sin( ) ≈ . This results in The constants < , = and > contain elements that must be added to the matrix , and the constants ! and " contain elements that must be added to the matrix (for the definitions, see Eq. 10). I will use the notation DP( , ) ( , = 1,2) to denote these elements. For , the following elements are added: • DP(1,1) = " ! " • DP(1,2) = DP(2,1) = = = " ! " • DP(2,2) = > = " " " + " And for , the following elements are added: Finally, I added stiffness and damping terms that were also added to the SDP. The stiffness and damping terms were added to, respectively, and .
Realistic constants for the SDP and the BDP I grouped the constants of the two bicycle models in several sets. Within every set, the constants for the SDP are described first, followed by those for the BDP. Eq. 12 Eq. 13 For a damping ratio < 1 the steering assembly oscillates in response to torque input. Because this does not happen in reality, must be at least 1.
The smaller the damping ratio , the faster the response of the steering assembly, which is advantageous for stabilization. I will consider the most responsive steering assembly, and therefore set = 1.

Lengths and masses of the bicycle and upper body models
The SDP models the bicycle (plus lower body) and the upper body as rods.  Our objective is to find the time at which the angular rate ̇( ) is the highest. This angular rate is the following:

Self-stability of the BDP
The strictly monotone transformation ¡( )¢ is a concave function of , and therefore the maximum of ( ) can be found by solving The result of this equation is = . Thus, the time constant is the time after movement onset at which the speed is the highest.
What are realistic turn radiuses, lean angles and steering angular rates?
I start from the following requirements: (1) the turn radius may not be less than the minimum radius below which the tires loose traction, (2) the lean angle may not exceed an upper bound above which most riders would feel uncomfortable, and (3)   Finally, to find an upper limit for the steering angular rate, I start from the fastest hand movement observed in a reaching task, which is 4 m s ⁄ [46]. Combining this linear velocity with a typical commuter handlebar width of 0.6 m, I find a critical steering angular rate of 13.33 rad s ⁄ .
Simulating the stabilization of the mechanical by the computational system I have written computer code in Matlab for simulating the stabilization of the mechanical by the computational system and visualizing the results.
This code is added to the supplementary material for this paper. With this code, one can perform all the simulations on which I have reported in this paper as well as variations inspired by one's own questions and hypotheses.
Running simulations is only possible in discrete time, and I must therefore discretize the continuous time model. This is the main topic of this section.

Simulating the combined system in discrete time
The discrete time axis is defined by the increment Δ : 0, Δ , 2Δ , 3Δ … .
The model in Figure 3 involves a closed loop, and to describe it, one can start at every point. Here, I start from the sensory input system, which receives the state ( ) from the mechanical system and feeds the noisecorrupted sensory input ( ) = ( ) + ( ) into the computational system. This is depicted schematically in Figure 9.

Solving the discrete time computational and mechanical system
For simulating the combined system, one must solve the discrete time mechanical and computational system. For the mechanical system, this involves finding ( + 2Δ ) by numerically integrating ̇= Ω( , ) = Ω( , + ) over the interval [ , + 2Δ ] starting from the initial condition ( ) and with external input = ( + Δ ) and = ( + Δ ). For this, I used the Matlab function ode45, which is based on an explicit Runge-Kutta (4,5) formula [47].
To solve the discrete time computational system, I follow a similar approach, but now take advantage of the fact that an explicit solution The matrices "GH , "GH , and "GH follow from the well-known solution of a linear state-space model with defining matrices , , and : "GH = I("G1) , "GH = -! ( "GH − ) , and "GH = [48]. Note that the sensory input (see Figure 9) is evaluated at a different time than the simulated state variable , because the former is obtained from the mechanical system. The cost functional "GH is minimized by control actions = − "GH ;, in which − "GH is the discrete time LQR gain (which depends on the matrices "GH , "GH , , and ), and ; is the optimal state estimate defined by this discrete time ODE: ;( + Δ ) = ( "GH − "GH );( − Δ ) Eq. 16 The matrix "GH is the discrete time Kalman gain, which depends on "GH , "GH , Σ "GH , and Ψ "GH .

Supporting information
All results can be reproduced and extended using a set of Matlab scripts and functions. This set is documented in the live script BicBalOFC.mlx.   : Sensorimotor control of a mechanical system (in red) by input from a computational system (in blue). The mechanical system is governed by the nonlinear differential equations ̇= Ω( , ), and the computational system produces an optimal control action . The motor output system (in black) adds noise to and feeds this into the mechanical system. The sensory input system (in green) maps the state variables to sensory variables, adds noise and feeds the resulting sensory input into the computational system. The computational system calculates an optimal internal state estimate ; by integrating a linear differential equation (characterized by the matrices , , , and the Kalman gain ) that takes the sensory feedback as input. The optimal control action is obtained from ; and the LQR gain − .     The inertial reference frame has an arbitrary origin, and the rider/bicyclecentered reference frame has its origin at the orthogonal projection of the combined CoG on the LoS. These reference frames have parallel coordinate axes. In green and blue, I depict the lean angles of the lower and the upper body ( ! and " ), and in red, I depict the yaw angle of the LoS. The horizontal plane (road surface) is colored light yellow. Figure 9: Schematic representation of the simulation of the combined system in discrete time. In red, green, blue and black, I show the variables that generated in, respectively, the mechanical, the sensory input, the computational, and the motor output system.