Development of swarm behavior in artificial learning agents that adapt to different foraging environments | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

Structure of the ECM.
The ECM consists of two layers, one for the percepts and one for the actions. Percepts and actions are connected by edges whose weight h_ij determines the transition probability from the given percept to each action (see Sec. 2.2 for details on the model).

More »

Fig 2 — Fig 2.

Graphical representation of the percepts’ meaning.
Only the front visual range (colored region) is considered, which corresponds to the values that category s_f can take. The focal agent is represented with a larger arrow than the frontal neighbors. The agent can only see its neighbors inside the visual range and it can distinguish if the majority are receding (light blue) or approaching (dark blue) and if they are less or more than three.

More »

Fig 3 — Fig 3.

Structure of the simulation.
Each ensemble of agents is trained for 10⁴ trials, where each trial consists of 50 global interaction rounds (g.i.r.). At each g.i.r., the agents interact sequentially (see text for details).

More »

Fig 4 — Fig 4.

1D environment (world).
Agents are initialized randomly within the first 2V_R positions. Food is located at positions F and F′. d_F is the distance from the center of the initial region C to the food positions.

More »

Table 1 — Table 1.

Description of the parameters used in the learning simulations with PS.

More »

Fig 5 — Fig 5.

Learning curves for d_F = 4, 10, 21 and d_F = 21 for non-interacting (n.i.) agents.
The curve shows the percentage of agents that reach the food source and obtain a reward of R = 1 at each trial. For each task, the average is taken over 20 (independent) ensembles of 60 agents each and the shaded area indicates the standard deviation. Zooming into the initial phase of the learning process, the inset figure shows a faster learning in the task with d_F = 10 than in the task with d_F = 21. In the case of d_F = 21, no agent is able to reach the food source in the first trial, and it takes the interacting agents approx. 200 trials to outperform the n.i. agents.

More »

Fig 6 — Fig 6.

Learned behavior at the end of the training process.
The final probabilities in the agents’ ECM for the action “go” are shown for each of the 25 percepts (5x5 table). (a) and (b) Final probabilities learned in the scenarios with d_F = 21 and d_F = 4 respectively. The average is taken over 20 ensembles (each learning task) of 60 agents each. Background colors are given to easily identify the learned behavior, where blue denotes that the preferred action for that percept is “go” and orange denotes that it is “turn”. More specifically, the darker the color is, the higher the probability for that action, ranging from grey (p ≃ 0.5), light (0.5 < p < 0.7) and normal (0.7 ≤ p < 0.9) to dark (p ≥ 0.9). Figures (c) and (d) show what the tables would look like if the behavior is purely based on alignment (agent aligns to its neighbors with probability 1) or cohesion (agent goes towards the region with higher density of neighbors with probability 1), respectively. See text for details.

More »

Fig 7 — Fig 7.

Final probability of taking the action “go” depending on the learning task (increasing distance to food source d_F) for four significant percepts.
The percepts are (< 3_r, < 3_a), (< 3_r, ≥ 3_a), (< 3_a, < 3_r), (≥ 3_a, < 3_r), respectively (see legend). The average is taken over the agents’ ECM of 20 independently trained ensembles (1200 agents) at the end of the learning process. Each ensemble performs one task per simulation (d_F does not change during the learning process).

More »

Fig 8 — Fig 8.

Trajectories (position vs. time) of an ensemble of 60 agents in one trial prior to any learning process.
The vertical axis displays the position of the agent in the world and the horizontal axis the interaction round (note that the trial consists of n = 50 rounds). Each line corresponds to the trajectory of one agent. However, some agents’ trajectories overlap, which is indicated by the color intensity. The trajectory of one particular agent is highlighted for clarity.

More »

Fig 9 — Fig 9.

Trajectories of all agents of an ensemble in the last trial of the learning process for (a) d_F = 21 and (b) d_F = 4.
Ensembles of agents trained to find distant food form aligned swarms (a), whereas agents trained to find nearby food form cohesive, unaligned swarms (b). With the same number of interaction rounds, aligned swarms (a) cover larger distances than cohesive swarms (b). In addition, observe that trajectories in panel (b) spread less than in Fig 8.

More »

Fig 10 — Fig 10.

Evolution of the global alignment parameter through the learning processes with d_F = 4,21.
At each trial, there is one data point that displays the average of the order parameter, first over all the (global) interaction rounds of the trial and then over 20 different ensembles of agents, where each ensemble learns the task independently. Shaded areas represent one standard deviation.

More »

Fig 11 — Fig 11.

Evolution of the average number of neighbors around each agent through the learning processes with d_F = 4,21.
At each trial, there is one data point that displays the average of M, first over all the (global) interaction rounds of the trial and then over 20 different ensembles of agents, where each ensemble learns the task independently. Shaded areas represent one standard deviation.

More »

Fig 12 — Fig 12.

Trajectories of an ensemble of 60 agents, in a world of size W = 8000, shown over 5000 interaction rounds.
(a) Agents trained with d_F = 21 form a swarm that continuously loses members until it dissolves completely. (b) Agents trained with d_F = 4 form a highly cohesive swarm for the entire trial. The centered inset of this plot shows the first 2500 rounds, with a re-scaled vertical axis to observe the movement of the swarm. Insets on the right zoom in to 20 interaction rounds so as to resolve individual trajectories.

More »

Fig 13 — Fig 13.

Evolution of the average number of neighbors throughout the trial of 5000 interaction rounds.
Average is taken over 20 ensembles of 60 agents each, where for each ensemble the simulation is performed independently. Shaded areas indicates one standard deviation.

More »

Fig 14 — Fig 14.

Average number of neighbors (in percentage), global and local alignment parameter as a function of the distance d_F.
Note that d_F is the distance to the point where food is placed during the training. Each point is the average of the corresponding parameter over all interaction rounds (50) of one trial, and over 100 trials. 20 already trained ensembles are considered.

More »

Fig 15 — Fig 15.

Percentage of agents that visit the positions situated at a distance from C given on the horizontal axis.
Since C is located at world position 6 (see Fig 4), a distance of e.g. 10 on the horizontal axis refers to the world positions 16 and 496. The already trained ensembles walk for one trial of 50 interaction rounds. For each of the four trainings (see legend), the performance of 20 ensembles is considered.

More »

Fig 16 — Fig 16.

Swarm velocity 〈ξ〉 as a function of the training distance d_F.
Each point is the average over the agents of 20 independently trained ensembles that have performed 50 independent trials each.

More »

Fig 17 — Fig 17.

Trajectories of one ensemble of 60 agents that were trained with d_F = 21.
The world size is W = 500. Color intensity indicates the number of agents following the same trajectory, i.e. moving within the swarm. Some agents leave the swarm and then rejoin it when the swarm completes the cycle and starts a new turn. Only the first 5000 interaction rounds (of a total of 10⁵) are shown.

More »

Fig 18 — Fig 18.

Hidden Markov model for the CCRW.
There are two modes, the intensive and the extensive, with probability distributions given by p_I and p_E (see text for details). The probability of transition from the intensive (extensive) to the extensive (intensive) mode is given by 1 − γ_II (1 − γ_EE), where γ_II and γ_EE are the probabilities of remaining in the intensive and extensive mode respectively. δ is the probability of starting in the intensive mode.

More »

Fig 19 — Fig 19.

Mean squared displacement (MSD).
Log-log (base 2) plot of the MSD as a function of the time interval for two types of trajectories: trajectories performed by agents trained with d_F = 21 (blue curve, circles) and by agents trained with d_F = 4 (orange curve, triangles). We observe that the former present ballistic diffusion, whereas the latter exhibit close-to-normal diffusion. 600 individual trajectories (10 ensembles of 60 agents) are considered for each case.

More »

Fig 20.

Survival probability as a function of the step length.
The survival probability is the percentage of step lengths larger than the corresponding value on the horizontal axis. Each panel depicts the data from the trajectory of one agent picked from (a) aligned swarms and (b) cohesive swarms, so that this figure represents the most frequently observed trajectory for each type of dynamics. The survival distributions of the four candidate models are also plotted. The distributions for each model are obtained considering the maximum likelihood estimation of the corresponding parameters (see Sec. 4.3 for details). The curve for the CCRW model is obtained by an analytic approximation of the probabilities of each step length, given the maximum likelihood estimation of its parameters. Since the order of the sequence of step lengths is not relevant for this plot, we estimate the probabilities of each step length as (see Eq (9)) with .

More »

Fig 20.

Survival probability as a function of the step length.
The survival probability is the percentage of step lengths larger than the corresponding value on the horizontal axis. Each panel depicts the data from the trajectory of one agent picked from (a) aligned swarms and (b) cohesive swarms, so that this figure represents the most frequently observed trajectory for each type of dynamics. The survival distributions of the four candidate models are also plotted. The distributions for each model are obtained considering the maximum likelihood estimation of the corresponding parameters (see Sec. 4.3 for details). The curve for the CCRW model is obtained by an analytic approximation of the probabilities of each step length, given the maximum likelihood estimation of its parameters. Since the order of the sequence of step lengths is not relevant for this plot, we estimate the probabilities of each step length as (see Eq (9)) with .

More »

Table 2 — Table 2.

Average values of the MLE parameters for the different models.

More »

Fig 21 — Fig 21.

Violin plots that represent the Akaike weights obtained for each model.
(a) Akaike weights of trajectories of agents trained with d_F = 21 (aligned swarms). (b) Akaike weights of trajectories of agents trained with d_F = 4 (cohesive swarms). 600 individual trajectories —per type of swarm— were analyzed for each plot. The ‘•’ symbol represents the median and the vertical lines indicate the range of values in the data sample (e.g. PL model in figure (a) has extreme values of 0 and 1). Shaded regions form a smoothed histogram of the data (e.g. the majority of Akaike weights of the CCRW model in figure (a) have value 1, and there are no values between 0.2 and 0.8). See text for more details.

More »

Fig 22 — Fig 22.

Percentage of trajectories that are best fit by each model according to the BIC criterion.
A model is considered to best fit the data of a given trajectory if it has the lowest BIC value and its difference with respect to the rest of the models is larger than 10. 600 individual trajectories —per type of swarm— were analyzed for each histogram.

More »