Fig 1.
Schematic illustration of behavioural route learning heuristics (A-D) and recapitulations strategies (E-D), see methods for details of implementation. Route learning heuristics.
(A) Baseline learning route formation uses a global vector together with obstacle avoidance (where necessary). (B) Beacon aiming biases the agent toward salient visual features during learning, with obstacle avoidance taking precedence when within a pre-set proximity threshold (refer to methods). (C) A gated oscillation introduces lateral oscillations into the route, restricting view learning to the portions of the oscillation in which the agent is returning to the central axis i.e. where the phase corresponds to the second and forth quarters of the oscillatory period. (D) An abstract implementation of goal directed learning walks. Additional learning takes place on path segments to the goal, at relative to the start-goal direction, for the strategy termed ‘Goal loop’. Recapitulation strategies, where the background represents the relative view familiarity when aligned with the route direction (E) View Based Orientation (VBO) performs an on-the-spot scan to select the most familiar heading at each step. (F) Familiarity Based Modulation (FBM) does not perform on-the-spot scans, instead the agent adjusts turn size (
) and step size (sFBM) based on the familiarity value. (G) Cast and Surge (CS) casts (
) around the heading given by view based orientation, modulated according to the best familiarity from an on-the-spot scan. Casting is suspended in favour of surging if the change in this familiarity is positive. Casting resumes with a small cast angle when the route is reached, essentially reducing the strategy to view based orientation.
Fig 2.
Divergence pattern and means across strategies as lateral displacement increases.
(A-C) Absolute initial displacements vs mean test route displacements for a 20m learning route, across 75 environment seeds with 45 starting positions (up to ± 5.5m), evaluated with respect to the entire learning route for strategies involving (A) View Based Orientation - VBO, (B) Familiarity Based Modulation - FBM and (C) Cast and Surge - CS. (D-F) Similar analysis to the preceding figures, however now evaluated against the last 5% of the learning route (i.e. for destination reaching). (G-H) Mean Test Route Displacements averaged across all initial displacements, again with respect to the entire learning route and last 5% of the learning route respectively. Bars which share a letter indicate groupings which are not statistically significant, as according to a pairwise Games-Howell comparison with α = 0.05. In all cases error bars present standard error on the mean.
Fig 3.
Training and test route examples across strategies for a single environmental seed.
Yellow corresponds to training route (arrows) and destination (cross), grey corresponds to test routes and trees are illustrated with brown and green circles representing the average trunk and canopy diameters respectively. Presented across strategies of (A) ‘Baseline + VBO’ (B) ‘Baseline + FBM’ (C) ‘Baseline + CS’ (D) ‘Res. FOV + VBO’ (E)‘Restricted (Res.) FOV + CS’ (F) ‘Beacon Aiming + VBO’ (G) ‘Beacon Aiming + CS’ (H) ‘Gated Oscillatory (Osc.) + VBO’ (I) ‘Gated Oscillatory (Osc.) + CS’ (J) ‘Goal Loop + VBO’ (K) ‘Goal Loop + CS’ (L) ‘Goal Loop + Gated Osc. + VBO’ (M) ‘Goal Loop + Gated Osc. + CS’.
Fig 4.
Divergence depends on length, displacement and navigational strategy.
(A-D) For the strategies of ‘Baseline + VBO’, ‘Oscillatory (Gated) + VBO’, ‘Baseline + CS’, and ‘Oscillatory (Gated) + CS’: Mean Test Route Displacements as a function of Absolute Initial Displacement and Distance Along Training Route, evaluated across 75 environment seeds. The red line marks the distance along the training route where the mean displacement from the route matches the initial displacement (i.e. parallel performance). Darker shades are larger displacements. (E-H) Example training (100m, yellow) and test (grey) routes across the same strategies, with the cross representing the route end and the rapid divergence radius marked by the dotted black line. Red arrows indicated similar patterns of agent loss (I) For these strategies, mean test route displacements for the full route length and across all absolute initial displacements. (J) For these strategies, the rapid divergence rate i.e. the proportion of routes which do not enter within a circle of radius equivalent to half the average distance between the start of the training route and the goal, as centered on the goal. For both (I) and (J), all error bars are standard error on the mean and groupings which are not statistically significant are annotated, as calculated according to a Games-Howell pairwise comparison with α = 0.05.
Fig 5.
Scan range can be modulated without compromising performance.
(A) Maximum scan range versus mean test route displacement for baseline + CS with and without modulation of the scan range by familiarity. Evaluated for a 20m training route, 5 displacements from −2m to 2m and over 10 environment seeds. Error bars represent standard error on the mean. (B) Distance from Training Route versus individual view scan range relative to the maximum scan range permissible (set at 160 degrees) for Baseline + CS with and without modulation of the scan range. Black markers represent every scan taken by the agent across all 10 environment seeds. Black and red lines represent the mean scan range as a function of absolute distance from the training route for agents operating with modulated and non-modulated scan ranges respectively (C-D) Examples of test route (wedges) (C) without and (D) with scan range modulation at 2m displacement for a single seed, converging to a training route (yellow arrows) leading towards a target (yellow cross). Size of wedges and grey colour spectrum represent size of scanning bout.
Fig 6.
Simulation environment and experimental procedure.
(A) Example unity simulation environment, with a 20m training route (red) and release points perpendicular to the route and running through the start position (perpendicular black line) (B) Example agent view (as acquired from the start of the route in A), panoramic, greyscale and measuring 36 x 180 pixels. (C) Agents are permitted to carry out a training route according to a route learning heuristic, of length 20m or 100m (depending on the experiment). Agents are then displaced to each of the release points and permitted to recapitulate for a distance 1.5 times the training route length.