Autonomous learning of features for control: Experiments with embodied and situated agents

doi:10.1371/journal.pone.0250040

Fig 1.

Illustration of the Walker2DBullet (top-left), HalfcheetahBullet (top-right), BipedalWalker (bottom-left), and MIT racecar (bottom-right) problems.

More »

Expand

Fig 2.

Schematization of the 5 experimental conditions (EtE, AE, AE-FM, StS, and FStS).

The picture at the top of each image indicate the observation (e.g. the angle and velocity of the agent’s joint, the proximity measures ext.). The action vectors are indicated with the letter a. The vector of features extracted from the auto-associative or by the sequence-to-sequence network are indicated with the letter z. The vector of features extracted by the forward model is indicated with the letter h. A indicates an auto-associative network that receives as input the observation at time t and produce as output the same observation at time t. F indicates a forward model network that receives as input the vector z, extracted from the auto-associative network at time t, and the action vector a at time t and produces as output the vector z at time t₊₁. S5 indicates a sequence-to-sequence associative network that receives as input the observations at time [t_-4, t] and produces as output the observation at time [t_-4, t]. FS5 indicates a sequence-to-sequence network that receives as input the observation at time [t_-4, t] and produces as output the observations at time [t_-3, t₊₁]. C indicates the control network that receives as input the observation (in the case of the EtE condition), or the features vector z (in the case of the AE, StS, and FStS conditions), or the features vectors z and h (in the case of the AE-FM condition). C produces as output the action vector that determines the movement of the agent’s joints or wheels and consequently the state (s) of the environment (E) at time t₊₁. The new state of the environment determines the observation at time t₊₁.

More »

Expand

Fig 3.

Performance during the training process in the case of the Walker2DBullet problem.

Performance refer to the average fitness of the best agent achieved so far. Data computed by post-evaluating the best individual of each generation for 3 episodes. Each curve shows the average result of 10 replications. The top and the bottom figures show the results obtained in the experiments performed in the pre-training and continuous training modalities, respectively.

More »

Expand

Fig 4.

Mean-squared error produced by the features-extracting networks before the training of the control network (0 steps) and during the training of the control network (after 25 and 50 * 10⁶ steps).

Results of the experiments performed in the AE, AE-FM, StS, and FStS experimental conditions. The boxplots shown in black and red shows the data of the experiments performed in the pre-trained and continuous training modalities, respectively. In the case of AE* and StS* conditions, some of the red boxplots are not visible because the distribution of the offset is close to 0.0.

More »

Expand

Fig 5.

Performance of the sequence to sequence condition in the continuous training modality (StS*) and of end-to-end condition (EtE) during the evolutionary process in the case of the HalfCheetahBullet, BipedalWalkerHardcore, and MIT racecar problems (top, middle and bottom figures, respectively). Performance refer to the average fitness of the best agent achieved so far. Data computed by post-evaluating the best individual of each generation for 3 episodes in the case of MIT racecar and HalfCheetahBullet and for 20 episodes in the case of the BipedalWalkerHardcore. Mean and 90% bootstrapped confidence intervals of the mean (shadow area) across 10 replications per experiment.

More »

Expand