A fully autonomous terrestrial bat-like acoustic robot

Echolocating bats rely on active sound emission (echolocation) for mapping novel environments and navigating through them. Many theoretical frameworks have been suggested to explain how they do so, but few attempts have been made to build an actual robot that mimics their abilities. Here, we present the ‘Robat’—a fully autonomous bat-like terrestrial robot that relies on echolocation to move through a novel environment while mapping it solely based on sound. Using the echoes reflected from the environment, the Robat delineates the borders of objects it encounters, and classifies them using an artificial neural-network, thus creating a rich map of its environment. Unlike most previous attempts to apply sonar in robotics, we focus on a biological bat-like approach, which relies on a single emitter and two ears, and we apply a biological plausible signal processing approach to extract information about objects’ position and identity.


Introduction
The growing use of autonomous robots emphasizes the need for new sensory approaches to facilitate tasks such as obstacle avoidance, object recognition and path planning. One of the most challenging tasks, faced by many robots, is the problem of generating a map of an unknown environment, while simultaneously navigating through this environment for the first time [1]. This problem, is routinely solved by echolocating bats that perceive their surroundings acoustically (other animals also solve this task on a daily basis using a range of sensory modalities) [2]. By emitting sound signals and analyzing the returning echoes, bats can orient through a new environment and probably also map it [3] [4]. Inspired by this ability, we present the 'Robat'-a fully autonomous terrestrial robot that solely relies on bat-like SONAR to orient through a novel environment and map it. Using a biologically plausible system with two receivers (ears) and a single emitter(mouth) which produced frequency modulated (FM) chirps at a typical bat rate, the Robat managed to move through a large out-doors novel environment and map it in real-time.
There have been many attempts to use airborne sonar for mapping the environment and moving through it using non-biological approaches; for example by using an array of multiple narrow-band speakers [5,6] [7] and or multiple microphones [8]. These studies proved, that by using multiple emitters, or by carefully scanning the environment with a sonar beam, as if it were a laser, one can map the environment acoustically, but these approaches are very far from the biological solution [9]. A bat emits relatively few sonar emissions towards an object, and it must rely on two receivers only (its ears) in order to extract spatial information from its very wide bio-sonar beam which can reach 60 degrees (6 dB double side drop in amplitude [10] [11] [12]). Unlike the narrow-band signals typically used in robotic applications, the bat's wide-band signals provide ample spatial information allowing it to localize multiple reflectors within a single beam. This is the approach we aimed to test and mimic in this study.
Numerous studies have shown that echoes generated by emitting bat-like sonar signals contain spatial information that can be exploited for localization and identification of objects [13] [14] [15] [16] [17] [18]. Several previous attempts have been made to model and mimic bats' spatial abilities of localization and mapping [19] [20]. One of the most comprehensive attempts to use a biological approach to map the environment was 'BatSLAM' [21], which relied on mammalian brain-like computation for simultaneous localization and mapping of a novel environment using biomimetic sonar. Using a biological representation of the data (the cochleogram) the BatSLAM algorithm generated topological maps in which the nodes represent unique places in the environment and the edges represent the robot's displacements between them. The approach of recognizing a location based on its unique acoustic signature was further broadened by Vanderelst et al. [6] who classified a wide range of natural scenes based on their acoustic statistics, once again, without extraction of their spatial characteristics. Vanderelst et al. limited the information extracted from the echoes to the acoustic resolution available to a bat, and they were still successful in achieving useful scene recognition.
Our work differs from these former studies in two important respects: (1) Our Robat moved through the environment autonomously while the previous robots were driven by the user. (2) We mapped the 2D structure of the environment, while they mapped the position of the robot in the environment. Namely, in our approach the outline of the objects that were encountered by the Robat were delineated so that paths (free of obstacles) were revealed for future use. In these previous studies, objects in the environment were mapped as locations with a unique acoustic representation so that when encountered again, the agent could localize itself on the acoustic-map, but no spatial information about objects' size or orientation was extracted. When moving autonomously, such information is essential for movement planning.
In addition to mapping, our Robat had to autonomously move through the environment while avoiding obstacles. Some previous attempts were made to model orientation and obstacle avoidance using a biological echolocation-based approach. For example, Vanderelst et al. [9], suggested a simple sensorimotor approach for obstacle avoidance based on turning away from the louder of the two echoes received by the ears. They showed that a simulated agent can move through a novel environment without any mapping of the positions or borders of the objects within it. This approach might be beneficial when an animal wants to move fast through the environment without an intention of returning to specific locations within it, but if the animal needs to find its way back to some point in this environment (e.g., to its roost), or to plan its movement to a specific location, some mapping must be performed. For example, the robust low-level sensorimotor heuristic presented in [9] could be combined with higher level mapping algorithms (e.g., [22]).
To our best knowledge, our Robat is the first fully autonomous bat-like biologically plausible robot that moves through a novel environment while mapping it solely based on echo information-delineating the borders of objects and the free paths between them and recognizing their type.

Results
The Robat's goal was to move through an environment that it has never experienced before, finding its path between vegetation and other obstacles while mapping their locations, delineating their borders and identifying them (when possible) similar to a bat flying through a grove or a shrubbery which it encounters for the first time ( Fig 1A).

Acquisition
The Robat moved through the environment emitting echolocation signals every 0.5m thus mimicking a bat flying at 5m/s while emitting a signal every 100ms which is within the range of flight-speeds and echolocation-rates used by many foraging bats [23] [24,25] [25] [26]. Every 0.5m, the Robat emitted three bat-like wide-band frequency-modulated sound signals while pointing its sensors (emitter and receivers) in three different headings: -60, 0, 60 degrees relative to the direction of movement ( Fig 1A). This procedure aimed to overcome the narrow acoustic beam of the Robat and to better mimic a bat beam which is typically much wider than that of our speaker (see Methods) [

Mapping
Following echo acquisition, acoustic peaks of interest (representing objects) were identified in the echoes (Fig 1B). Equivalent peaks-i.e., peaks returning from the same object-received by the two ears were matched and the reflecting objects were localized. The time-delay between the emission and the arrival of the echoes was used to determine the distance of an object and the difference between the time of arrival of the echo to the two ears was used to determine its azimuth (i.e., Mapping was performed in 2D, Fig 1C, Turquoise points depict objects' location, see Methods for full details). Importantly, the Robat was able to localized multiple objects whose echoes were received within a single beam (S1 Fig). This ability has not been reported in previous studies and bats are likely able to do so. After every 5 steps (i.e., 2.5m) the Robat applied an inflation and interpolation algorithm that incorporated the newly mapped objects into the map that has been created so far (based on the previous echoes, Fig 1C, yellow shaded area, see Methods). At each time step, following echo acquisition and object localization, the Robat planned its next movement according to the iterative map that has been created so far and according to the objects detected in the most recent acquisition. Movement planning was based on the bug algorithm [30] which can be simply described as turning 90 degrees to the right, whenever an obstacle is encountered ahead, and then turning left to maneuver around the obstacle.
The movement and mapping algorithms were tested in two outdoor environments: (1) The pteridophyte greenhouse (5m x 12m) and (2) The palm greenhouse (40m x 5m) both situated in the Tel Aviv University Botanical Garden.
The Robat successfully moved through both new environments without hitting objects and while mapping their locations and contour line (see Robat's trajectory depicted in black in Fig  2A). When an obstacle was placed in the Robat's way, it moved around it ( Fig 2B). To quantify the mapping performance, we compared the contour of the objects as it was estimated by the Robat to the real contour (which we estimated from drone images in the Palm greenhouse and measured manually in the Pteridophyte greenhouse, see Methods). In the palm greenhouse, the mean distance between the two contours was 0.42 ± 0.74 (mean + STD) [m] meaning that along the 35m trail that the Robat passed and mapped in the Palm greenhouse, the estimated borders of the objects on both sides of the trail, were off by 42cm on average, relative to their real position. This might seem inaccurate when considering bats' ability to estimate range with an accuracy of less than 1cm in a highly controlled experiment, [31] [32] but it should be emphasized that the Robat only detected and localized parts of the objects while their borders were delineated based on our inflation an interpolation algorithm (Methods). Moreover, note that many of the objects in our environment were plants with multiple branches so that the exact borders of the objects were inherently difficult to define (even in the drone images). Similar performance (0.44 ± 0. 25

Classification and decision making
When moving through the environment, a real bat can probably use echoes in order to classify objects into categories (e.g., rocks, trees, bushes) and even to identify specific objects (e.g., a specific beech tree in its favorite foraging site). Such recognition would greatly assist the bat to navigate, for example, by recognizing specific landmarks at important turning points along its flight route and it could also assist its foraging, for example, by recognizing specific vegetation that is rich in fruit or insects [33] [17]. So far we demonstrated that the Robat can translate a novel natural environment into a binary map of open spaces and obstacles. In order to improve the mapping, we added a classification step to the algorithm, which was performed using a neural-network that was trained to distinguish between two object categories-plants and non-plants. To this end, a set of acoustic features were extracted from the echoes and used as input for the network (Methods). The Robat was able to classify objects as plants or non plants significantly above chance level (Fig 2C and Table 1) with a balanced accuracy of 68% (chance was 50%, P = 0.01, based on a permutation test with 100 permutations, the balanced Finally, we tested the functionality of this classification ability by purposefully driving the Robat into a dead end where it faced obstacles in all directions ahead (i.e., right, left and straight ahead, S7 Fig). The Robat had to determine which of the three obstacles was a plant, through which it could drive, and it did so successfully at~70% of the cases (in accordance with its~70% accurate classification rate, see movie: https://www.youtube.com/watch?v= LzGGuzvYSH8-second 49 and onward).

Discussion
In this study, we managed to build an autonomous robot that moves through a novel environment and maps it acoustically using bat-like Bio-sonar. We achieved high mapping accuracy, despite our simple approach, proving the great potential of using active wide-band sound emissions to map the environment. We created a (2D) topographic map which would allow us to plan future movements through the environment (and not a topological map). The statistical approach presented in [9] is therefore complementary to ours, allowing classifying specific locations based on their echoes. For example, when navigating back to a specific location using the map created by the Robat, their approach could be used to validate the arrival at the desired location and also to help adjust the map to improve its accuracy.
The Robat was much slower than a real bat, stopping for ca. 30 seconds every 0.5m to acquire echoes. This slowness was however, merely a result of the mechanical limitations of our system and mainly the gimbal that was slow. Using a speaker with a wider beam (that eliminates the need to turn at each location) would allow the Robat to acquire echoes on the move, while moving as fast as a bat. Importantly, despite our stopping for echo recording, the acoustic information we acquired did not differ from that received by a bat, except for the fact that a bat's echoes would also be slightly Doppler-shifted (but this would probably not affect any of our results).
In some respects, our processing was not fully bat-like. We used a sampling rate of 250kHz, which is higher than the theoretical time precision of the auditory system [34]. Bats and other small mammals have been shown to estimate azimuth with an accuracy of <10degrees (the exact accuracy depends on the azimuth, (e.g. [35,36])). This accuracy accounts for an inter-aural time difference of <10μs which is in accordance with our sampling rate (sampling at 250kHz is equivalent to an error of~5μs when estimating time differences between two ears). Therefore, even if our computation was different from that of a bat (which does not cross-correlate two highly sampled time signals) the overall accuracy allowed by our approach was not better than that of a bat. Moreover, due to the inflation and interpolation method that we used in order to delineate the borders of the objects, the effective accuracy of our mapping was much lower than that allowed by this high sampling rate, and probably much lower than that available to bats [31,32]. Therefore, we hypothesize that using an auditory preprocessing model like that used in Batslam for example [21] would probably not change our results dramatically. Another advantage that we had over real bats was the relatively large distance between the two ears which were spaced 7cm apart-ca. two times more than in a large bat. This probably allowed more accurate azimuth estimations, but once again, we hypothesize that because of the use of inflation, this did not improve our performance dramatically. Importantly, we managed to extract information about multiple objects within a single sonar beam. On average, in each echo that contained reflections (some echoes did not) we detected 4.1 objects positioned in a range of azimuths between -50 -50 degrees. Another important difference between the Robat and an actual bat is the lack of an external ear in the Robat. The angle-dependent frequency response of the external ear allows bats (and other animals) to gain information about the location of a sound source in three dimensions. Because we relied on temporal information for object localization, we used a first approximation of an ear. Adding a structure mimicking the external ear could have further improved our localization performance and it would be essential in order to expand our mapping to 3D. In order to better mimic the bat's beam, we used three beams (directed 60 degrees apart), but this made our task easier than a bat's because we could analyze the echoes returning from each direction separately. We therefore also tested an approach in which we sum the three echoes collected (with different headings) at each acquisition point, thus mimicking a wider beam. Even with this degraded data, we were able to map the environment with a decent accuracy of 1.14 ± 0.70 [m] (mean + STD, S6b Fig), an accuracy that would allow future planning of trajectories while avoiding obstacles on the way.
In some respects our approach was probably much more simplistic than a bat. For example, the obstacle avoidance algorithm was very simple and a better approach would probably use control-theory to steer the Robat around obstacles [37]. In terms of mission priority, we used serial processing where the Robat first processes new incoming sensory information; it then performs the urgent low-level task of obstacle avoidance and path planning, and only every several acquisitions, it performs the high-level process of map integration. There is much evidence that the mammalian brain also performs sensory tasks sequentially (e.g., [38]) but it would be interesting to test some procedures for parallel processing in the future.
In addition to mapping the positions of objects in the environment, a complete map should also include information about the objects such as their type or identity. To show that such information is available in the echoes, we developed a classifier that can categorize objects based on their echo. We hypothesize that the medium classification performance that we achieved (68%) was a result of our choice of categories. We trained the classifier to distinguish between plant and non-plant objects but these are not always two well distinct groups. For example, the echo of an artificial object such as a fence will have vegetation-like acoustic features and indeed most of the classifier's mistakes were recognition of non-plants as plants. Bats might thus divide their world of objects differently, perhaps to diffusive vs. glint-reflecting objects.
Altogether, we show how a rather simple signal processing approach allows to autonomously move and map a new environment based on acoustic information. Our work thus proves the great potential of using acoustic echoes to map and navigate, a potential that is translated into action by echolocating bats on a daily basis.

Acquisition
The Robat was based on the 'Komodo' robotic platform (Robotican, Israel). The Bio-sonar sensor was mounted on a DJI Ronin gimbal which allowed turning the sensing unit relatively to the base of the robot in a stable manner. The sensing unit included an ultrasonic speaker acting as the bat's mouth (VIFA XT25SC90-04) and 2 ultrasonic microphones acting as the bat's ears spaced 7cm apart (Avisoft-Bioacoustics CM16/CMPA40-5V Condenser). The speaker and the microphones were connected to A/D and D/A converters which were based on the USB-1608GX-2AO NI DAQ board, sampling at 250KS/s at each ear. The emitted signal was a 10ms FM chirp sweeping between 100-20kHz. It was amplified using a Sony Amplifier (XM-GS4). An uEye RGB camera, was used for image collection for validation purposes only. Three 2.4GHz/5.8GHz antennae were mounted at the rear of the Robat for wireless communication between the Robat and a stationary station. This allowed viewing the map created by the Robat in real time, but importantly, all calculations and decisions were performed on the Robat itself.

Mapping
While moving, the Robat stopped every 0.5m (based on its odometry measurements) and the sonar system (emitter and receivers) was rotated to three different headings [0,60,-60 degrees] relative to the direction of movement, a sound signal (see above) was emitted, and echoes were recorded. Each recording was 0.035 sec long, equivalent to a range of ca. 6 meters (farther objects were thus ignored at each emission). The signal-to-echo delay time and the time of arrival differences of the echoes to the two ears (i.e., the Interaural Time Difference) were used together in order to map the environment. To this end, the received signals were cross-correlated with the theoretical emitted signal. The cross-correlated signal was normalized relative to the maximum value of the recording, and a peak detection function was used to find peaks of interest (python peakutil with a minimal peak distance of 0.002 sec, and a min amplitude of 0.3.).
To match peaks arriving at the right and left ears, for each peak detected in one ear, an equivalent peak was searched for in the other ear within a window of +/-0.001 sec. If a peak was found, the Pearson correlation was used to determine if the two echoes were reflections of the same object. For this purpose, a segment of 0.01 seconds around each peak was cut and the correlation between the two time signals (one from each ear) was computed. Only correlations higher than 0.9 were accepted. This threshold was conservative thus potentially resulting in missing of objects, but it reduced the localization of artifact non-existent objects. Because the Robat emitted very 0.5 m-there was much overlap between echoes of consecutive emissions. We were therefore likely to detect an object several times, so a conservative approach was chosen. In addition to its position, each object on the map was defined by three parameters: "C |T |P", where C is the Pearson correlation coefficient between the left and right ears for the specific point, T is the object's type based on its acoustic classification-either artificial or a plant, and P is the classification probability (see more below about the classification process).
Results in the in-doors controlled environment showed that using two ears, the mean error in distance estimation was 1.3 ± 2.1 [cm] (mean + STD, S3 Fig) and the mean azimuth estimation error was 1.2 ± 0.7 [degrees] (mean+STD, S3 Fig). Importantly, these are the results for a single reflector, so accuracy in the real environment where many reflections are received at each point will be lower.
Every 5 Robat-steps, newly localized objects were integrated into the map that was created so far. This was done using an Iterative-Object-Inflation algorithm, which inflated points into squares and connected them. To this end, the entire area around the Robat was divided into a grid with 2000x2000 pixels (5x5cm 2 each). Each detected object was placed in the corresponding pixel on the map and was inflated to an area of 20x20 pixels around its center (i.e., 1x1 m 2 , S4 Fig). This procedure creates a binary map with 1's depicting objects and 0's depicting a free path. Pixels along the trajectory that the Robat previously moved through always received the value 0 depicting an open path (even if they were within the 20x20 window of a detected object).

Movement and obstacle avoidance
We chose a very simple obstacle avoidance approach also known as the 'bug algorithm' [39]. During the exploration process, the Robat moved forward in steps of 0.5m between consecutive acquisition points. When detecting an obstacle less than 1.2m in front of it, the Robat turned 90 degrees towards the right, and performed a 1m step towards the right (after checking that there is no obstacle ahead). After performing a 1m step to the right, the Robat turned 90 degrees to the left and acquired an echo. If no obstacle was detected (meaning that the obstacle has been passed) the Robat continued straight (i.e., in its previous direction before turning right). If the way was still blocked (i.e., the obstacle was not passed), the Robat turned again to the right and kept moving towards the right (90 degrees relative to its original direction).

Summing echoes from all three headings
In order to better mimic the bat, that has a beam much wider than Robat's beam, we examine an approach of summing the echoes returning from the three different headings (mentioned above) into one superposition echo, and then running the same (detection, localization and mapping) algorithms as described above.

Evaluation of the mapping accuracy
In order to examine the acoustic map generated by the Robat, inspired by [40], we collected aerial images using a drone (DJI Phantom 4, DJI), to construct a complete ground truth map of the area. This procedure was only performed for the large palm greenhouse (40x5 m 2 ). The contour of the objects on both sides of the trail in the greenhouse was extracted and compared to the contour of the inflated map that was acoustically reconstructed by the Robat (both contours were marked manually). Each of the two contours was fit by a 55-coefficient order polynomial function which was then sampled at 500 points to get a high resolution description of the contour. The two contours (real and Robat-estimated) were compared by calculating the root-mean-square distance between them (the average over these 500 points, S5 Fig).

Classification
Acoustic based object classification was performed using a neural-network that was trained on a binary task-classifying whether and object was a plant or not. Only objects that were located closer than 3[m] from the sensing unit were classified. 0.035 s long echoes were used from both the right and left ear. These recordings were passed through three band pass filters, without the transmitted echo, (20-40kHz, 40-60kHz and 60-100kHz). Each echo was represented by 6 signals-3 filters x two ears. Next, a set of 21 acoustic features ( Table 2) were extracted from each band-passed recording following T. Giannakopoulos [41]. Each echo was divided into seven windows equally spaced with an overlap of 40ms and the 21 features were extracted for each window generating a total of 147 dimensions per signal (21 features x 23 windows). The classifier was thus fed with 6 signals (483 dimensions each) and the decision of the majority of the six classifiers was used.
The data was fed into a neural network with the following architecture: Input layer-483 elements Finally, to assess the statistical significance of our classification, we ran 100 permutations in which we assigned the training data randomly into the two classes (plants and non-plants), trained a classifier for each permutation and tested it on the same test-data.
We also tested several additional classification methods before choosing the neural-network. We tested a KNN (K nearest neighbors) classifier with five different distance measurements: Mahalanobis, Euclidean, Correlation, Minkowski and Canberra. We also tested two additional approaches for dimensionality reduction (before using the KNN) including PCA and LDA. In addition, we also tested a linear SVM classifier. For all classifiers, we used the same input features (see above).
The results were similar for most classifiers, but the neural network performed slightly better than the other (S8 Fig). Supporting information