Figure 1.
The time window and the error propagation dynamics used for calculating the TiPI, eq. (11).
In principle, the process is considered many times with always the same starting value but different realizations of the noise . Note that, when using the one-shot gradients, only one realization is needed.
Figure 2.
The hysteresis cycle in the gradient picture.
The diagrams show the stages of one hysteresis cycle starting from (A) with the state at
as represented by the sphere. Decreasing
creates the asymmetric situation (B). If
the saddle-node bifurcation happens, i. e. both the maximum at z = 0 and the right minimum disappear so that the system shifts to the left minimum of the potential (C). Increasing
until
brings us back to the initial situation with the state shifted to the other well see (D,E). The diagrams (F) and (G) depict the switching from the minimum at
to the minimum at
by increasing
. By decreasing
until
the hysteresis cycle is finished, see (H,I).
Figure 3.
State and parameter dynamics in the one-dimensional system.
(A) Only dynamics (fixed
); the bias
oscillates around zero and causes the state
to jump between the positive and negative fixed points. The TiPI is seen to increase steadily until it eventually drops back when the state is jumping. (B) With full dynamics (
).
increases until it oscillates around its average at
where the hysteresis cycle starts. Parameters:
,
,
in (B),
.
Figure 4.
The probability density distributions with different time windows of the stochastic process in an asymmetric double well potential.
The mean first passage time of switching between wells is one characteristic time constant of the process [53],
increasing exponentially with the barrier height. If observing the process in a window of length
, the distribution of (A) will be observed. In that situation, the TiPI is maximal if the wells are of equal depth (
). However, with windows of length
, the system state will be predominantly in one of the wells generating the distributions shown in (B), (C). Gradient ascending the TiPI will decrease the well depth as long as the probability mass is still concentrated in that well. This is what drives the hysteresis cycle depicted in Fig. 2.
Figure 5.
The robot has 12 hinge and 6 slider joints, each actuated by a servo motor and equipped with a proprioceptive sensor measuring the joint angle or slider length. The robot is strongly underactuated so that it can not take on a wheel like form where locomotion were trivial.
Figure 6.
Role of the fast synaptic dynamics: depending on the speed of the synaptic dynamics defined by , the locomotion properties are changing drastically.
Depicted is the distance traveled by the robot in 10 min simulated time on an empty plane. The inset gives a close up view for low , demonstrating that the locomotion starts only if
exceeds a certain threshold value. Shown is the mean and standard deviation of 10 runs each. Update frequency 25 Hz.
Figure 7.
Regular locomotion pattern and interaction with the environment.
Plotted are the center positions of the 6 rigid segments in space for an interval of 40 sec. One line is highlighted for visibility. The trajectory starts while the robot is moving to the left (A) and is hitting the wall (B) (black box) and locomotes to the right (C) showing a very regular pattern. Then it overcomes an obstacle (D) and hits the wall (E) and moves back (F). The behavior is cyclic. Parameter: .
Figure 8.
Armband robot surmounting an obstacle and inverting speed at a wall.
Screen shots from the simulation for Fig. 7. The order is row-wise from left to right. The last two pictures show the situation after switching off the parameter dynamics for a few seconds (the robots stops) and enabling it again (starts moving).
Figure 9.
The Humanoid robot in four different scenarios.
(A) Normal environment with flat ground. (B) The robot is hanging at a bungee like spring. (C) The robot is attached to a high bar. (D) Robot is fallen into a narrow pit.
Figure 10.
Parameter similarity for the behavior in different environments (Fig.
9). Plotted is the results of a hierarchical clustering based on the difference between the parameters in each of the simulations (averaged over time). For each of the four environments there are three initial poses: (straight upright),
and
slanted to the front. The parameters for runs in the same environment are clustered together. This supports the observation that the embodiment plays an essential role in the generation of behavior. More importantly the physical conditions are reflected in the parameters and are thus internalized. We used the squared norm of the difference of the absolute values of the matrix elements. The absolute values were used because a common structure in the parameters are rotation matrices and there the same qualitative behavior is obtained with inverted signs. Parameters:
,
, update frequency 50 Hz.
Figure 11.
Dimensionality of behavior on different time scales.
Humanoid robot in bungee setup running 40 min with different control settings. The sensor data is partitioned into chunks of a fixed length, the graph depicting the effective dimension over the length of the chunks for different settings. In order to test the method we start with a uniformly distributed noise signal for motor commands (“noise signal”). As expected the observed dimension is maximal. The sensor values produced by that random controller show a lower dimension (“noise ctrl.”) as is expected due to the low pass filtering property of the mechanical system. All other cases are with the TiPI maximization controller with different update rates . In particular, the comparison with the
case demonstrates that the exploration dynamics produces more complex behaviors than any fixed controller.
Figure 12.
Pairwise distances of chunks with length 10 s. Distance is defined as the length of the vector of maximal projections of the first 6 principal components.