High-Resolution Underwater Mapping Using Side-Scan Sonar

The goal of this study is to generate high-resolution sea floor maps using a Side-Scan Sonar(SSS). This is achieved by explicitly taking into account the SSS operation as follows. First, the raw sensor data is corrected by means of a physics-based SSS model. Second, the data is projected to the sea-floor. The errors involved in this projection are thoroughfully analysed. Third, a probabilistic SSS model is defined and used to estimate the probability of each sea-floor region to be observed. This probabilistic information is then used to weight the contribution of each SSS measurement to the map. Because of these models, arbitrary map resolutions can be achieved, even beyond the sensor resolution. Finally, a geometric map building method is presented and combined with the probabilistic approach. The resulting map is composed of two layers. The echo intensity layer holds the most likely echo intensities at each point in the sea-floor. The probabilistic layer contains information about how confident can the user or the higher control layers be about the echo intensity layer data. Experimental results have been conducted in a large subsea region.


Introduction
Overview Acoustic sensors (sonars) are the modality of choice in underwater robotics [1][2][3]. One of these sensors is the Side-Scan Sonar (SSS) which provides echo intensity profiles of the sea bottom. Side-Scan Sonars are widely used in sea floor imagery, and should remain in the near future, basically for economic reasons, but also because of its ease of deployment: in some cases they have a towfish structure, so there is no need for complex mountings on Autonomous Underwater Vehicles (AUV), Remotely Operated Vehicles (ROV) or ships.
SSS is used for providing imagery to a wide variety of scientific applications in different underwater scenarios including deep sea, shallow waters, lakes or rivers.
Geological aspects like tectonic, volcanic or hydrothermal structures, as well as sedimented areas, among others, have been reported and studied in detail thanks to different sidescan sonar campaigns carried out in the last twenty years. Blondel in [4] presents a detailed compilation of representative examples using sidescan in such a scientific terrain.
The extension from geology to biology applications stems from the fact that many studies report a strong relation between benthic community structure and substrate type [5]. Thus, marine scientists have used sidescan sonar to assist in mapping and understanding the spatial extent of seabed habitats and benthic ecosystems.
Acoustic surveys typically cover tens or hundreds of square kilometers, thus conventional sidescan imagery interpretation made by-eye is hardly feasible and prone to subjectivity. As a consequence, different alternatives for automatic segmentation of benthic habitat using sidescan data have been proposed [6][7][8]. Comparing the output of the algorithms with the habitat ground truth from grab or video samplings indicates accuracies of the classification strategies above 80% Still, the SSS imagery has some properties that jeopardize automated analysis [9]. For example, SSS provide misrepresented slices of the sea bottom [10], especially in non-flat terrains. Moreover, joining these slices to produce dense acoustic maps require accurate sensor pose data and significant interpolation in most cases. Additionally, SSS, as well as any other kind of sonar, ensonify the observed region unevenly [11] and, thus, the sensor itself influences the perception of the sea bottom.
The three aforementioned problems prevent, among others, the use of SSS data to build large scale sea-floor maps involving complex robot trajectories [12]. In most cases, the AUV is forced to move through straight transects to reduce the need for accurate localization and data interpolation. Even in these constrained scenarios some problems arise when trying to fuse data corresponding to overlapping regions.
A common approach to produce geometrically meaningful maps is the occupancy grid [13]. Although occupancy grids are aimed at distinguishing free and occupied regions, similar approaches have been applied with relative success in the context of SSS mapping [14,15]. Unfortunately, the occupancy grid techniques still present some problems when used to construct dense SSS maps. These problems appear because most of the existing occupancy grid algorithms decompose the high-dimensional mapping problem into a set of one-dimensional problems, being the value of each cell estimated independently [16].

Proposal
Our goal is to generate high-resolution maps, beyond the SSS resolution, of large sea floor regions using SSS by explicitly dealing with the aforementioned problems through the following four steps.
The first step, the Swath Correction, consists in improving the individual data items provided by the SSS. This is achieved in three stages. During the first one, the effects of the uneven ensonification pattern are removed using a physics-based sensor model [11]. Then, the central region of each data item (the blind zone) is detected and removed as it does not provide useful information. Afterwards, the data is projected to the actual sea floor. As, in most cases, bathymetry is not available, the floor is assumed to be flat to perform such projection. The error due to this assumption is properly modelled and evaluated, showing at which extent it is affordable.
The second step, localization, consists in estimating the AUV or the SSS pose. To this end, we use a Doppler Velocity Log (DVL) continuously and a Global Positioning System (GPS) when the AUV surfaces. The data provided by both sensors is fused using an Extended Kalman Filter (EKF). In order to deal with different sampling rates between the localization sensors and the SSS, a constant velocity model is adopted in the EKF prediction step.
During the third step, Probabilistic Map Building, a probabilistic map of the sea floor is built. This map does not hold information about the structure of the sea floor but about the probability of each mapped region to be observed by the SSS. To this end, some probabilistic SSS models as well as a method to combine the probabilistic information due to different overlapping measurements is used. These models share some similarities with the ones presented in [17] and used in [18,19] with the following main differences. Our proposal focuses on fusing probabilistic and echo intensity information whereas the previous proposals were aimed at discarding useless data. Also, our proposal deals with echo intensity vectors whilst the previously mentioned require a single TOF reading with no echo intensity data.
During the last step, Echo Intensity Map Building, the probabilistic information is combined with the echo intensity information coming from the SSS to provide a meaningful representation of the sea floor. The proposed approach produces high resolution maps, even at higher resolutions than those of the SSS because it is able to combine the echo intensity information coming from several SSS measurements gathered at different poses whenever they overlap.
The better the resolution and the precision of the sidescan sonar is, the more accurate are the automatic classification algorithms results. Hence, the proposal here presented can benefit scientific applications that use sidescan sonar for mapping and segmentation.
It is important to emphasize that these steps are not executed sequentially, but are sensor driven: when a new SSS measurement is gathered, it is improved and used to ammeliorate the probabilistic map and to extend the echo intensity map. When a new GPS or DVL measurement arrives, it is used to update the localization data. Fig 1 summarizes the aforementioned steps.
As a result of these steps, a map consisting of two layers is built. On the one hand, a probabilistic layer. The data in this layer is useful to, for example, help the user or the high-level control layers to decide which areas have been properly mapped and which ones deserve further reobservation. On the other hand, an echo intensity layer, which properly fuses all the obtained echo intensities provided by the SSS.
The novelties in this proposal are the following. First of all, the combination of the aforementioned techniques. For example, the proposed physics-based model to remove the uneven ensonification pattern has never been used in the context of dense acoustic map building. Also, the literature is scarce on models analysing the effects of assuming a flat floor. Thus, the proposed analysis is also one of the novelties in this paper.
Additionally, the proposed approach to build the echo intensity map deals with the problems of occupancy grid building by adopting a technique based on forward sensor models [16]. This approach explicitly takes into account the horizontal SSS opening, which is usually not considered in the literature. Finally, our proposal generates a two-layer map providing information not only about the actual sea floor structure but also about how confident the user or higher-level modules can be about that information.
Our proposal is tested using real SSS and DVL data gathered along a large subsea trajectory in Port de Sóller (Mallorca, Spain).
The paper is structured as follows. First, the SSS operation and the basic terminology is presented in Section The Side-Scan Sonar. Section The flat floor assumption models the effects of the flat floor assumption. All the aforementioned processes to improve the SSS data are detailed in Section Swath correction. Afterwards, Section Localization presents the specific EKF formulation used to deal with DVL and GPS sensors.

The Side-Scan Sonar
Overview Fig 2 illustrates the operation of a SSS mounted on a AUV navigating at an altitude h. An AUV usually has two SSS heads, symmetrically placed on port and starboard with a fixed angle θ. Each sensing head periodically generates an ultrasonic pulse which, after reaching the sea floor, is partially scattered back to the sensor. The SSS then analyses the received echo to obtain information about the ensonified region (ER). The expansion of the sound wave with time is modelled by the sensor opening α in the YZ plane and the sensor opening ϕ in the XY plane.
The sensor operation is as follows. After each ultrasonic pulse emission, the sensor records the received echo intensities at fixed time intervals until a new pulse is emitted. Henceforth, this recorded data vector will be referred to as a swath and each of its items, which store an echo intensity providing information about the structure and reflectivity of the sea floor, will be referred to as a bin. The Time Of Flight (TOF) corresponding to each bin determines the The main steps are swath correction, in charge of improving the individual SSS swaths; probabilistic map building, whose task is to generate a probability map estimating the likelihood of each pixel to be observed; echo intensity map building, which constructs a grid where each cell holds the most likely echo intensity corresponding to that cell; and localization, whose task is to estimate the AUV pose. These steps are not executed sequantially, but they are sensor-driven. slant range r s of a point p in the ER. The sampling period determines the slant range resolution δ s and the time between emitted pulses determines the maximum sensor range.
The acoustic echo intensity recorded in each bin is mainly influenced by the reflectivity of the sea floor, but also it is attenuated depending on the travelled distance and corrupted by the SSS ensonification pattern. Our proposal to reduce these undesired effects is referred to as the intensity correction [11] and is described in Section Intensity correction.
The position of a point p in the ER is usually expressed using the polar coordinates (r s , θ s ) in the YZ plane. The angle θ s is known as the grazing angle and can computed as follows: where h p is the altitude of p. If h p is unknown then θ s cannot be computed, and the point responsible for the echo at r s may be anywhere in the angular interval y À a 2 ; y þ a 2 Â Ã . Computing the elevation map of the points in the sea floor using only SSS data is a hard, time-consuming and error prone task addressed by few researchers [10]. Because of that, if no bathymetry is available through other sensors, a common approach is to assume that all the points in ER have zero altitude (h p = 0). This assumption, which is known as the flat floor assumption, introduces an error when building maps. The effects of the flat floor assumption will be analysed in Section The flat floor assumption.
Because of the sensor opening ϕ a similar problem appears in the XY plane. The bin with slant range r s is due to the objects lying in an arc q, as shown in Fig 2. One or more objects lying inside q may be responsible for the received echo intensity. This fact is usually neglected in the literature, assuming a pencil-like thin beam in the XY plane because ϕ is very small in Side-scan sonar characterization. The AUV navigates at an altitude h with two SSS sensing heads symmetrically placed on port and starboard with a fixed angle θ. The angles α and ϕ are known as the sensor openings and model the sound expansion rate in the YZ and XY planes respectively. A point p in the ensonified region is usually expressed in polar coordinates using its slant range r s and its grazing angle θ s , though sometimes the point altitude h p is also used. Given this information, the point position can be expressed as a range r g over the Y axis. Finally, due to the ϕ opening, the exact position of a point in the XY plane cannot be determined. Instead, the point can be anywhere over the arc q. doi:10.1371/journal.pone.0146396.g002 High-Resolution Underwater Mapping Using Side-Scan Sonar most SSS. A novelty in this study is that the opening in the XY plane is not neglected, but properly used to fuse the information coming by overlapping ER.
The ground range r g of a point p is the projection over the Y axis of the vector ! op joining the AUV reference frame and the point p. The ground range is useful to properly map each bin to its corresponding position in the sea floor. The process of computing the ground range for every bin is the so called slant correction. Our approach to this process is described in Section Slant correction.

The swath structure
A typical SSS has two synchronized sensing heads and, thus, the swaths are provided in pairs: one provided by the port sensing head and one provided by the starboard sensing head. Let us, for the sake of clarity, join each two simultaneously gathered swaths in a single vector. Henceforth, the term swath will refer to this new vector. Fig 3 shows an example of a swath corresponding to the port (bins 1 to 250) and starboard (bins 251 to 500) sensing heads of a particular SSS. The central region with low echo intensities is the so called blind zone and it corresponds to those distances from the sensor where no sea floor was detected. Thus, echo intensity values in the blind zone are due to sensor noise and suspended particles in water. The first significant echo outside the blind zone is the First Bottom Return (FBR) and corresponds to the first sea floor point that produced an echo.

Acoustic image formation
As the AUV moves, several swaths are gathered. By aggregating swaths, an acoustic image is built. The most basic form of acoustic image building consists in putting the swaths together, one next to the other. Fig 4 shows an example of this kind of images, where each echo intensity is mapped to a grayscale level so that lower intensities are darker. The changes in the black strip width, which is the blind zone, reflect changes in the AUV altitude: the higher the altitude, the wider the blind zone.
As the blind zone does not hold useful information about the environment, but noise and echoes due to suspended particles in water, it is desirable to detect and remove it. This process is known as blind zone removal. Our approach to the blind zone removal is described in Section Blind zone removal.
Although this approach may be sufficient for human inspection, it provides a distorted representation of the actual sea floor for two reasons. First, it does not take into account the specific AUV motion between swaths. Second, the vertical axis does not represent the ground range but the slant range. Solving the first problem involves AUV localization or SLAM techniques, as well as the use of interpolation and blending techniques to fill gaps and combine overlapping swaths. This is, precisely, the main goal of this paper and is going to be described in detail in further sections. As for the second problem, properly mapping slant ranges to the sea floor consists in performing the aforementioned slant correction.

The flat floor assumption Overview
Properly projecting the SSS swaths to the sea floor is an error prone, complex and time consuming task [10]. Thus, unless bathymetry is available, it is common to assume that all the objects in the ER have zero altitude (h p = 0) with respect to the point where the AUV altitude is measured. This is the so called flat floor assumption.
At the extent of the authors knowledge, very few studies exist that explicitly analyse the errors introduced by the flat floor assumption in SSS imagery. For example, [20] states the problem but focuses on correcting its effects using bathymetry. [21] also states the problems but mainly accounts for them when analysing acoustic images, neglecting their effects during the acoustic image formation. In general it is assumed that if the terrain roughness is small when compared to the AUV altitude, the errors are neglectable. This is the case of [2] or [11].
In these studies, the ocean floor is assumed to be locally flat and experimentally shown that this assumption is affordable in real underwater scenarios with relatively flat floors.
The goal of this Section is to model the errors due to the flat floor assumption in order to quantify them and show with which kind of sensors and environments such assumption is reasonable.

Error model
From Figs 2 and 5a it is easy to see that the ground range r g can be expressed as a function of the measured slant range r s , the AUV altitude h and the altitude h p of the detected object as follows: Performing the flat floor assumption means using h p = 0. Accordingly, we define the flat floor assumption error e (see Fig 5a) as the difference between the true ground range and the one corresponding to h p = 0 as follows: The slant range r s is computed from the TOF of each bin in the swath and the altitude can be measured by sensors commonly available in underwater robots, such as DVL or the SSS itself. Moreover, in case no sensor measuring h is available, it can be easily computed from the FBR, as it will be shown later. To the contrary, h p is unknown. Thus, for a given bin and time, the error only depends on the unknown object altitude.
We define an error to be fully neglectable if it is below the SSS resolution. That is, we assume that errors below that resolution will not affect the acoustic image formation.

Minimum and maximum object altitudes
Given an altitude h and a slant range r s , not all object altitudes h p are possible, especially if the opening α and the angle θ are considered. From Figs 2 and 5b the minimum and maximum possible object altitudes are as follows: if y À a 2 ! 0 and y þ a 2 p 2 . These two conditions mean that the SSS is mounted on the AUV so that the acoustic beam never crosses the Y or the Z axes, which happens in most SSS configurations. Let the minimum and maximum object altitudes leading to fully neglectable errors be denoted as h 0 p;min ðr s ; hÞ and h 0 p;max ðr s ; hÞ respectively.

Across-track slope
There is a final consideration to be performed. Even if we assume that protuberances and holes in the sea floor are within the fully neglectable error ranges, the ocean floor slope in the acrosstrack direction (Y axis) has also to be taken into account, as it may lead to significant changes in h p . The minimum and maximum terrain slopes leading to fully neglectable errors can be where r s, max is the sensor range. It should be noticed that slopes outside this range lead to not fully neglectable errors when considering the absolute ground range, but the errors would be much smaller locally, leading to almost locally perfect acoustic images for a much wider range of slopes.
As a conclusion, the models presented in this Section make it possible to quantify the errors due to the flat floor assumption. These models will be configured with our particular sensor parameters in Section Experimantal result, showing that the flat floor assumption is more than reasonable with our specific AUV configuration and also affordable in similar missions. For this reason, the flat floor assumption is going to be performed throughout this paper.

Intensity correction
It can be observed in Figs 3 and 4 that the echo intensities are much higher in the regions surrounding the blind zone. This effect is due to a non homogeneous ensonification leading to brightness variations in the resulting acoustic image that may difficult further automated processing. Some studies deal with this problem, but mainly using some environment dependant heuristics [9].
Our proposal has a well founded theoretical basis [11], as it relies on sea bed reflectivity and sound propagation models. To remove the variable ensonification component from the acoustic image, we model the sea floor as a Lambertian surface [10,22], which scatters incident energy uniformly in all directions. Under this assumption, the echo intensity I m (p) returned by a SSS measurement m from a sea floor point p = (r s , θ s ) is modelled as follows: where ϕ(p) denotes the ensonification intensity, R m (p) is the reflectivity of the sea floor, β m (p) is the incidence angle, which coincides with the grazing angle θ s under the flat floor assumption, and K is a normalization constant. We derived the ensonification intensity model from the sensitivity pattern model proposed by Kleeman and Kuc [23]. Accordingly, ϕ can be expressed as follows: where M is a proportionality constant, J 1 is the Bessel function of the first kind of order 1, λ and f are the emitted pulse wavelength and frequency respectively and a is the transducer radius. This Equation accounts for the two main aspects that modify the intensity perceived at a given point in the sea floor: its angular position with respect to the sensor acoustic axis and its distance to the sensor origin, which is related to the sound attenuation with the travelled distance.
As it may be difficult to find an accurate value for the transducer radius a from the sensor specs, we propose the following procedure. Let θ 0 be the first occurring θ s producing ϕ(.) = 0. Thus, θ 0 defines the main sonar lobe: the ensonification intensities corresponding to grazing angles in the interval [θ 0 , 2θ − θ 0 ] are due to the main sonar lobe, whilst those outside the interval are produced by the secondary lobes. The value of θ 0 can be derived [23] from Eq 9 and is as follows: Let us assume that the whole sensor opening α corresponds to the main lobe, which is reasonable because manufacturers usually build sensors in this way. The grazing angle corresponding to the positive boundary of the ensonified region according to the opening is θ − α/2 (See Fig 2). Thus, using θ 0 = θ − α/2 in Eq 10 leads to a fair approximation of the transducer radius a: According to Eq 8, the measured echo intensities, that is, the bin values in each swath, depend on the sea floor reflectivity, the ensonification intensity and the incidence angle. An acoustic image taking only into account the sea floor reflectivity would be a more realistic representation of the environment, as it discards the effects due to the sensor itself. As we have an ensonification model and we know the incidence angles, we can express the corrected value for each point p in the ER as follows: where K 0 = (K Á M) −1 . Thus, the acoustic image built from R m (p) constitutes the intensity corrected image of I m (p). Further methods described in this paper make use of the corrected swath R m (p). However, all of them can also use the echo intensities I m (p). Because of that, to ease notation, the terms I m (p) and echo intensity will be used to refer to a swath and the meaning of the values it holds respectively, either if they are actually echo intensities or the corresponding reflectivity values.

Slant correction
As stated previously, the process of projecting each bin to its corresponding position in the sea floor is known as slant correction. Under the flat floor assumption, this can be easily achieved through Eq 2 using h p = 0. However, from an algorithmic point of view and to avoid gaps in the slant corrected swath, it is preferable to to have the slant range as a function of the ground range: In this way, for every possible ground range, the slant range can be computed and the corresponding echo intensity value obtained. The ground range resolution δ g when performing the slant correction can be chosen depending on the desired granularity. However, a good choice is to use the same value as the slant range resolution δ s , which is sensor dependant, and that is the approach adopted in this paper. Also, as the slant range corresponding to a specific ground range may lie between two bins, linear interpolation is used to obtain the echo intensity. Algorithm 1 summarizes this process. The term I m (Á) denotes the input swath. It is a vector where each value represents the echo intensity corresponding to a single bin. The term I m g ðÁÞ denotes the corrected swath. It is a vector where each value represents the echo intensity corresponding to a bin in the ground plane. The size of a bin in the ground plane is the ground range resolution δ g .
For the sake of simplicity, henceforth the term swath will refer to the ground projected swath I m g ðÁÞ unless the contrary is stated.

Blind zone removal
Under the flat floor assumption, the AUV altitude h and the ground range corresponding to the FBR (r FBR ) are related through θ and the sensor opening α as follows: If the altitude h is known at each time step, the r FBR can be computed and used to perform the blind zone removal. Assuming that the two SSS sensing heads are symmetrically placed on port and starboard, the blind zone removal consists in discarding all the bins whose ground ranges lie between −r FBR and r FBR . If they are not symmetrically placed, two r FBR should be computed, one for each sensing head. As stated previously, these bins do not hold useful information about the environment, but noise or echoes from suspended particles in water and thus they shall be discarded.
It should also be noticed that, in the unlikely case that the altitude h is not available, the FBR could be determined using some signal processing technique and the altitude h computed using Eq 14.

Localization Overview
From the localization perspective, the roll and pitch angles are assumed to be zero, which is reasonable in most surveying missions where the robot control is designed to keep these angles to that value. Thus, the localization problem is a 4DOF problem where x, y, z and the yaw angle ψ have to be estimated. Result: I m g ðÁÞ: Corrected swath. Vector containing the echo intensity for each bin. 1 r g, min = r g (r s, min , h, 0); 2 r g, max = r g (r s, max , h, 0); 3 δ g δ s ; 4 for r g r g, min to r g, max step δ g do Without loss of generality, this Section focuses on the localization sensors used in our particular configuration: a DVL providing velocities over the three axes and a GPS providing absolute pose when the AUV emerges. Additionally, our DVL provides altitude information and, also, absolute heading by means of its internal compass. The proposed localization approach relies on the standard EKF formulation, similarly to [14]. Accordingly, other sensor configurations could be easily used.
It has to be taken into account that localization, in this study, is not the goal but only the mean to produce accurate sea floor maps using SSS. Thus, the localization modules also have to pay attention to the different sampling rates between the localization sensors and the SSS. In our particular case, SSS provides data at higher rate than DVL and GPS. Thus, the proposed localization approach has to be able to provide the best possible pose estimate even when no DVL or GPS is available. To this end, DVL and GPS data are used in the EKF update step and a constant velocity model is used in the prediction step.

The constant velocity model
The state vector is composed of the 3D AUV position and the yaw angle and the corresponding velocities, which are mandatory because of the adopted constant velocity model and also to allow using the DVL data during the update step: According to the constant velocity model, the prediction is performed as follows: where s t − 1 and c t − 1 denote sin(ψ t − 1 ) and cos(ψ t−1 ) respectively. By means of this model, when no sensor data is available, the prediction step will assume that velocities are the same as the last time they were measured and provide a pose estimate accordingly.

The update step
The update step has to take into account the two aforementioned sensors (DVL and GPS), that may provide data at different sampling rates. As a matter of fact, GPS updates are likely to be very infrequent as they only happen when the AUV surfaces. As for used DVL device, it provides altitude, heading and velocities as follows: The GPS data, when available, is adapted to the following format: Also, when the mission starts, the GPS position and the absolute heading are used to initialize the state vector.
As these sensors provide data directly related to specific state vector items, the measurement functions h DVL and h GPS and associated Jacobian matrices H DVL and H GPS are straightforward: Localization is achieved by using the constant velocity model in the EKF prediction step and the aforedescribed measurement vectors and functions in the update step.

Probabilistic map building
Overview At this point, we have described how each individual swath can be improved after the SSS gathers it and also how to properly estimate the AUV pose even at higher frequencies than those of the localization sensors. Also, we have shown the extent at which the flat floor assumption is affordable and, as the experimental results will corroborate, that it is reasonable to perform such assumption in a wide range of scenarios including ours.
This Section describes how a high resolution and high quality map of the subsea floor can be built using the improved swaths. This map is a 2D representation of the sea floor and, thus, it focuses on the XY plane (Fig 2). Being our proposal based on probability theory, we first describe the probabilistic SSS model and the probability map and, then, a method to combine the probabilistic information with the swath data to build the map.

The SSS model
In the context of this paper, a map M is composed of two layers: the probability map M P and the echo intensity map M I . Both are matrices of R rows and C columns where each cell represents a squared area of the sea floor of δ M × δ M meters. For the sake of clarity, let the cells be referred to as pixels and δ M as the map resolution. In the case of M P each pixel holds information about its own probability to be observed. As for M I each pixel holds information about the echo intensity of the sea floor area it represents. We would like to emphasize that, as stated previously, the term echo intensity may refer to the raw echo intensity or to the corrected one processed as described in Section Intensity correction.
Let us express a pixel q by the coordinates of its four corners q 0 , q 1 , q 2 and q 3 . Let q m i ¼ ðr m q;i ; y m q;i Þ be the polar coordinates of the corner q i in the ground plane with respect to the reference frame of a specific SSS measurement m 2 S, where S denotes the set of all the gathered measurements, projected to the ground plane. In order to compute the position of the map pixel corners relative to the the SSS measurement, the position of the SSS measurement with respect to the map is required. This position is provided by the localization modules and, as described in Section Localization, it is available at the desired frequency thanks to the adopted constant velocity model.
Let imin and imax be the indexes of two pixels corners so that y m q;imin and y m q;imax are the rightmost and the leftmost angles. Finally, let r m g;min and r m g;max denote the minimum and maximum ground range, respectively, that the SSS can reach. Fig 6 illustrates these concepts.
The first step is to model the Probability Density Function (PDF) p of a point q i to be observed by the SSS measurement. A realistic, physics-based, approach is to adapt Eq 9 to model the probabilities so that the points in the ER receiving higher ensonification energy are more likely to be observed. However, this approach is time consuming and the computation requirements when building large maps may be extremely large. Instead, our proposal is to model the probabilities as a Gaussian whose parameter is the angle y m q;i , which captures the shape of the acoustic main lobe. Accordingly, the PDF pðy m q;i Þ is defined as follows if the point range is within the observable range: Fig 6. The SSS model. Each map cell or pixel represents a squared area of δ M × δ M meters. Each cell can be represented by the coordinates q 0 , q 1 , q 2 , q 3 of its four vertexes. The terms y m q;imin and y m q;imax denote the angle with respect to the AUV reference frame of the rightmost and leftmost vertexes of a particular cell as observed from the AUV. The terms r m g;min and r m g;max denote the minimum and maximum ground range, respectively, that the SSS can reach. In this way the probability is maximum when the point lies on the acoustic axis and decreases as the angular position gets farther from the acoustic axis. The standard deviation is set to φ 2 so that the 95% of the probability lies in the sonar main acoustic lobe. The next step is to compute the probability of a pixel q to be observed by the measurement m. In other words, the goal is to compute the probability of a squared region of the sea floor to be observed by m. Taking into account that the PDF in Eq 23 only depends on the angle y m q;i and that a pixel q lies in the angular interval [y m q;imin , y m q;imax ], the probability P m G ðqÞ can be computed as follows: As this Equation refers to a whole pixel, it may happen that the pixel partially lies within the observable range. Our proposal in this case is to compute the probability according to Eq 24 if the pixel is either fully or partially inside the observable range and to assume the probability is zero otherwise.
Unfortunately, using Eq 24 may be time consuming. It has to be taken into account that for every pixel in the map the model has to be computed for each gathered measurement, as it will be shown later. In order to reduce the computational cost, two additional models are proposed.
The first additional model consists in using a triangular PDF so that the probability is maximum when y m q;i ¼ 0 and zero outside the angular interval [À φ 2 , φ 2 ], decreasing linearly when approaching the interval boundaries. By integrating this PDF similarly to Eq 24, the probability of a pixel q within the measurement range to be observed is: The second additional model aimed at reducing the computational cost consists in using a uniform PDF whose variable is y m q;i . Accordingly, the probability of a pixel q within the measurement range to be observed is: In this Section three approaches to model the probability of a map cell q to be observed by a particular measurement m have been proposed. Henceforward, these three approaches will be referred to as the Gaussian Approach, the Triangular Approach and the Uniform Approach and their associated probabilities as P m G , P m T and P m U respectively. Also, as they can be used interchangeably, depending on the desired accuracy and available computational resources, the notation P m will be used to refer to any of the three approaches to compute the probabilities. Some examples involving the three approaches are shown in Section Experimental results.

The probability layer
This Section focuses on building the probability layer M P of the map. Given the set of SSS measurements m 2 S, a pixel q 2 M P may not have been observed by any measurement or it may have been observed by one or more measurements. Fig 7 illustrates the four situations in which a pixel can be. Pixels q0 and q1 have not been observed by any measurement and, thus, their probability to be observed is zero. However, when building the reflectivity map these two pixels have to be considered differently, as it will be shown in Section Geometric map building.
Pixel q2 may have been observed by m1 with probability P m1 (q2), as described in Section The SSS model. As for pixel q3 it may have been observed by m0, m1, m2, m3 or m4 with probabilities P m0 (q3), P m1 (q3), P m2 (q3), P m3 (q3) and P m4 (q3) respectively. Accordingly, the probability of q3 to be observed is: where E mi qj denotes the event pixel qj observed by measurement mi and has probability P mi (qj). In general, where S q & S is the set of measurements that may have observed q. That is, As for ε, its value depends on the SSS model. If the Triangular or the Uniform models are used, then ε = 0. In the case of the Gaussian model ε should be greater than zero. Alternatively, a geometric approach could be used by including in S q the set of measurements whose observed sector, defined by r m g;min , r m g;max and ϕ, does contain q. Assuming that the events E mi qj are mutually independent but not mutually exclusive, the probability P(q) can be computed as: Algorithm 2 summarizes the process to build the probabilistic layer. For each pixel, the coordinates q G of its four corners with respect to a global reference frame are computed in line 3. Then, the set S q of measurements that may observe the current pixel is computed as follows.    y m q;imax ). Then, line 10 decides if the measurement m may observe the pixel q using either Eq 29 or a geometric criteria. If it may be observed, both m and q are stored in S q and Q respectively.
Afterwards, the probability of the current pixel is computed from line 16 to line 22 taking advantage of the data previously stored in S q and Q. In the case of pixels not observed by any measurements, line 22 assigns zero probability to them.
As it can be observed in the algorithm, line 7 and the check in line 8 are executed R Á C Á |S| times. As an example, in the particular experimental setup described in Section Experimental setup, the number of measurements |S| is 68434, including the port and the starboard sensing heads. If a map of 1000 × 1000 pixels is to be built, then the probabilistic model has to be evaluated, in the worst case, 6.8434 Á 10 10 times. Line 10 is also likely to be executed in a large number of iterations. As this may be extremely time consuming, it is crucial for these three lines to be fast.
However, the most time consuming part of the algorithm lies in the loop between lines 16 and 22 because the probabilistic model P m (q) is evaluated there. As the number of iterations in this loop depends on the measurements that are likely to observe the current pixel, having an accurate rejection criteria when building S q is very important.
Finally, we would like to emphasize that it is not necessary to store the probability layer M P as the values for each pixel are never re-used after the corresponding cell in the echo intensity layer M I is evaluated. However, the probability layer could also be used in other tasks to let the user or other higher-level AUV functionalities know how confident it can be about the information in every mapped region.
Some experiments showing the probability layer M P using the three aforedescribed SSS models with real data in a large subsea mission are provided in Section Experimental results.

The echo intensity layer
This section describes our approach approach to build the echo intensity layer M I . The first step is to compute the echo intensity value V m (q) for a pixel q according to a measurement m. This is achieved by linearly interpolating the values in I m g for each pixel corner and then computing the mean of the four obtained values as follows: 8 > > > > < > > > > : d g c and w 2 = 1− w 1 . In case the pixel q is not observed by measurement m, we assign it the arbitrary value of 0.
Our approach to compute the echo intensity value V(q) corresponding to a pixel q taking into account all the measurements m 2 S q that may have observed it is to weight the individual V m (q) according to the corresponding probability P m (q). In this way, an echo intensity value corresponding to one measurement is more influential in the final value if the measurement has higher probability of having observed the pixel. That is: By computing V(q) for all the pixels in the map, the echo intensity layer M I is built. Some experiments regarding the echo intensity layer are presented in Section Experimental results.
Geometric map building Fig 7 shows that some pixels may not be observed according to the proposed probabilistic model. In that example, neither pixel q0 nor pixel q1 were observed and, thus, the proposed approach to build the echo intensity layer will assign them a value of zero.
However, these two pixels exemplify two different situations. As for pixel q0, its value should be certainly set to zero or to any value denoting there is no information about it. Contrarily, the value of pixel q1 could be extrapolated from the surrounding measurements. By doing so, although the probability to observe q1 is actually zero, the visual gaps in the resulting map can be properly filled.
To perform the extrapolation, which is called the geometric map because it is based on pure geometry, we start by representing the environment as a polygonal mesh. The four edges of each polygon correspond to the acoustic axes of consecutively gathered measurements joined by their extrema as illustrated in Fig 8, where each polygon p i is built by joining the acoustic axes of measurements m i and m i+1 .
Then, every pixel in the map is checked against all the polygons to decide whether it is inside any of them. A pixel is said to be inside a polygon if at least one of its corners is inside. If the goal of the geometric map building is only to fill the gaps of the echo intensity layer, this step can be extremely speed-up by checking only those pixels whose probability to be observed is zero. If the pixel is inside, its four corners are projected to all of the involved acoustic axes and an echo intensity value for each of them is computed through linear interpolation. Also, the perpendicular distance from each of these corners to all of the involved acoustic axes is computed. The final value assigned to this pixel is a weighted mean of the obtained echo intensities whose weights are inversely proportional to the corresponding perpendicular distance.
By means of this method, the gaps in the echo intensity layer can be filled. Also, this approach can be used standalone to build an alternative representation of the sea floor.

Experimental setup
The experiments shown in this section have been performed using an Imagenex SportScan SSS with two sensor heads attached to an EcoMapper AUV. The main parameters of this system are shown in Table 1. As for the mean altitude being h = 5m, we would like to emphasize that the true values during all the mission are almost constant and very close to the mean value, which is a common situation in underwater surveying missions. The AUV was also endowed with a DVL, providing dead reckoning and altitude, and a GPS providing position data when in surface. Moreover, the DVL is also endowed with a compass and, thus, it provides absolute heading data.
The AUV mission consisted of a sweeping trajectory along more than 4Km in Port de Sóller (Mallorca, Spain). During the straight transects, the AUV was underwater gathering SSS and DVL data. During the turns, the robot emerged to obtain the GPS position. Fig 9 depicts the AUV trajectory, computed using the localization approach described in Section Localization.
Port de Sóller is a small bay in the North coast of Mallorca (Balearic archipelago, Spain). The seafloor of the bay is sandy with little sea-grass meadows laid out quite regularly. Rock outcrops are frequent at the mouth of the bay. The experimental area depth ranges from 12 to 22m and the significant points appearing on the sonar images mostly correspond to stones and Posidonia Oceanica plants. Along some of the transects the sonar visualizes a 80cm diameter pipe used for fresh water supply.
All the SSS measurements used in these experiments, both raw and processed, are available in [24] together with some high-resolution maps and additional mission data.
Along the experiments, some data regarding the time consumption will be provided. In these cases the measured times correspond to a Matlab implementation running on Ubuntu 14 and an Intel i7 CPU at 3.1GHz. The flat floor assumption Let us evaluate the effects of the flat floor assumption by parametrizing the error models presented in Section The flat floor assumption with the specific SSS used in this paper. Fig 10a shows the errors according to Eq 3. For each slant range, the minimum and maximum object altitude have been computed using Eqs 4 and 5. As for the AUV altitude, values below the one shown in the figure have not been considered as in that cases it is not possible to perform the flat floor assumption. Let the smallest possible slant range according to this criteria be denoted by r s, min . The largest value r s, max = 30m is one of the sensor parameters. Fig 10b shows all the combinations of slant range and object altitude that lead to fully neglectable errors. In the worst case, which appears with small slant ranges, object altitudes between −16cm and 16cm lead to fully neglectable errors. In the best case, corresponding to large slant ranges, the flat floor assumption leads to fully neglectable errors for objects with altitudes ranging from −66cm to 71cm. If the reasonable mean slant range of 15m is considered, then the flat floor assumption leads to no errors for object altitudes between −33cm and 32cm.
To illustrate this analysis, let us focus on some particular data in the dataset used in this paper. To this end, let us first introduce one additional concept. It is well known that the height of an object observed by a SSS can be computed by measuring the length of the projected shadow. This task is extremely difficult and error prone to be automated, but for some especially contrasted regions it can be done manually. Let r s1 be the slant range of the last nonshadow point of the object whose height is to be computed. Let r s2 be the largest slant range of the shadow projected by the object. It is straightforward to see that the object height is: By visual inspection, we found that the structure shown in Fig 11 is the one with the largest projected shadow and, thus, the one with largest altitude. By means of Eq 33, the obtained height is 1.51m. According to Eq 3 the flat floor assumption will, in this case, lead to an error of 0.7m. Thus, the error is not fully neglectable as it is larger than the sensor resolution. However, as the error in this case is 5.8 times the sensor resolution, the highest point of the structure will be misplaced less than 6 bins in the resulting acoustic image. As the acoustic image has 500 Fig 11. Structure with the largest shadow in our dataset. The values r s1 = 10.1m and r s2 = 14.5m have been found by manually selecting the shadow. This example is used to determine the maximum ground range error e max in our dataset due to the flat floor assumption. This maximum error is 0.7m and is not fully neglectable, but it leads only to a 1.16% of misplacement with respect to the whole SSS range. bins, this means only a 1.16% of misplacement with respect to the whole SSS range. Let this error corresponding to the highest structure in the dataset be denoted by e max .
As for the effects of the across-track slope, computing Eqs 6 and 7 using our SSS parameters shows that slopes within the interval s min = −2.24% and s max = 2.39% lead to fully neglectable errors.
The results obtained by configuring the models with our particular sensor parameters are summarized in Table 2. This data shows that the flat floor assumption leads to small errors and, in most cases, to fully neglectable errors in the scenarios where our AUV is deployed and suggests that similar configurations and environments would lead to a similar conclusion.

Swath correction
This Section provides some experimental results showing the effects of the swath correction methods presented in Section Swath correction. The sensor parameters used are those presented in Table 1, except for the altitude h. For the sake of clarity, regions where the AUV altitude changed have been chosen here. Fig 12a shows an example of the ensonification intensity for a whole swath according to the model presented in Section Intensity correction parametrized for our sensor at an altitude h = 6.54m. In this case, bins denote slant ranges. It provides a fair representation of the sensor behavior, as it shows two main intensity peaks, one for each sensor head. The intensity peaks correspond to the main acoustic lobes, but they are also influenced by the distance of each bin to the sensor. Because of that, each peak is not symmetric with respect to its acoustic axis. The central region clearly reflects low ensonification intensity of the blind zone because of its proximity to the main lobe boundary. Fig 12b shows a set of 1000 ensonification intensity curves corresponding to the the beginning of a transect. During the first 330 swaths, the vehicle altitude descended from h = 9.4m to h = 3.9m. From swath 331 onward, the vehicle altitude was almost constant and close to h = 3.9m. As the AUV descends, the two intensity peaks are getting closer and the region corresponding to the blind zone becomes smaller, stabilizing when constant altitude is reached. This is consistent with SSS imagery, in which the blind zone changes depending on the altitude.
By comparing Fig 12 and the swath examplified in Fig 3, it is clear that by removing the ensonification intensity component the resulting swath will be a much more realistic representation of the sea floor.    Fig 15b shows the same data after applying slant and intensity correction. As a result of the former process, the pipe shape is corrected to the straight line it should be. The uneven brightness in the original image is also corrected in that case thanks to the intensity correction.
There are several tasks in which properly knowing the geometry of underwater structures is important. For example, in industrial tasks such as underwater cable or pipe inspection in which wrong geometry estimates may lead to wrong defect detections. Also, in biological surveys to estimate the growing of algae population a proper geometry reconstruction is crucial to obtain accurate measurements. Moreover, in both cases, the structures to be analyzed, either biological or industrial, are usually close to the sea floor, rendering the swath correction even more effective.

The SSS model
This Section provides some illustrative examples of the three approaches presented in Section The SSS model to model the probability of a map cell to be observed.  It can be observed how the Triangular approach leads to results quite similar to those of the Gaussian approach, whilst the Uniform approach produces a very simplistic representation. In all cases the probability decreases with distance because the angular range where the pixel lies is smaller for farther pixels. Table 3 shows the time spent in the probability computation when building the images in Fig 16. As it can be observed, the Triangular model leads to a significant reduction of execution time with respect to the Gaussian approach. Also, it is clear that, in terms of computation time, the Uniform model provides the best results.

The probability map
This Section shows some experimental results obtained when building the probability layer M P using the approach described in Section The probability layer. Fig 17 shows the probability layer M P corresponding to a region of 180m x 90m observed by the SSS in the real subsea environment using (a) P m G , (b) P m T and (c) P m U . As it can be observed, in all cases the probability is large and very close to 1 for those pixels observed by several measurements. Also, it can be observed that the three resulting probability maps are very similar. This suggests that when several readings overlap, the specific model used to compute individual The pipe image is corrected to its actual straight shape. The uneven brightness is also improved due to the intensity correction process. In both cases a red line is overlayed to emphasize the geometric improvement. probabilities has almost no influence. Also, it has to be taken into account that the large size of pixels in this case (30cm x 30cm) lead to similar values when integrating the three proposed PDFs. Fig 18 shows the probability layer of a region of 15m x 15m corresponding to the upper-left corner of the images in Fig 17. The resolution is 5cm. In this case, the differences between the three models are easily appreciable. For example, changes in probability between nearby pixels are smooth in the Gaussian and the Triangular approaches, but not in the Uniform approach. Also, the probability peak close to the acoustic axis is slightly more pronounced in the Triangular approach than in the Gaussian approach.
It can be observed that the probability values are different depending on the resolution. The map can be seen as a discrete representation of a continuous probability field. Larger pixels, thus, accumulate a wider range of probabilities and, thus, have larger values. That is why the probability values are different between Figs 17 and 18. Table 4 summarizes the time consumption when building the two previously shown groups of maps. As it can be observed, the Gaussian approach always leads to the largest computation times and the Uniform approach to the smallest ones. However, the ratio is different to the one shown in Table 3. This suggests that the time spent to evaluate the probabilistic models is very small when compared to the remaining tasks.
Also, the times spent in Scenario A and Scenario B are not proportional, as illustrated by the time per pixel, which is significantly different in both cases. As stated previously, building the map requires evaluating the probabilistic model |S q | times for every pixel, where S q is the set of measurements in which the pixel q is contained. As Scenario B concentrates in a region much smaller than the one of Scenario A, a very large amount of readings can be discarded with the very simple and fast criteria of not overlapping the rectangle defined by the map. This clearly shows that having fast criteria to discard non-informative measurements is crucial in terms of time consumption.
As a conclusion, when building low resolution maps such as those in Fig 17 the specific probabilistic model barely influences the probabilistic layer. Thus, in this case the Uniform approach is a good choice as it is faster.
When building high resolution maps such as the ones in Fig 18, however, the situation changes. On the one hand, the differences between the three models arise, being the Gaussian one the preferred from this point of view. On the other hand, the influence in time consumption due to the model evaluation is larger: in the low resolution map, using the Gaussian approach leads to a 3% increase in computation time with respect to the Uniform approach, but in the high resolution map the execution time increases an 8%. Thus, from the time consumption point of view, the Uniform approach is to be considered. In these cases, deciding the model to use depends on the available computational resources and the number of map cells to compute.

The echo intensity map
Fig 19a shows the echo intensity map built using a resolution of 30cm and corresponding to the whole mapped area. As it can be observed, the approach successfully merges the swaths when they overlap.    The ability of the proposed approach to fuse echo intensities makes it possible to build visually consistent maps of the environment, helping the user to have a complete representation of the observed sea floor from the partial views provided by each SSS measurement. This is especially relevant in scientific and industrial tasks in which the objects of interest may be perceived in very different ways depending on the ensonification angle. In these cases, the ability to fuse different object views into a consistent map is crucial to properly understand submerged geological or industrial structures or coral reefs, for example, where shadows may lead to very different echo intensity profiles depending on the ensonification angle. High-Resolution Underwater Mapping Using Side-Scan Sonar High-Resolution Underwater Mapping Using Side-Scan Sonar Another example is provided in Fig 21. In this image both the raw data and the corresponding echo intensity map are shown. The improvements in resolution as well as the benefits of the echo intensity correction are clearly visible.
As for the time consumption, the results are shown in Table 5. As it can be observed, the mean time per pixel in the largest scenario is smaller than in the other ones. This suggests that, algorithmically, the time spent in operations not depending on the map size is predominant.   Examples of operations not depending on the map size are discarding SSS measurements not influencing the map or computing their coordinates with respect to the global map frame. Also, the number of SSS measurements influencing the smaller regions is proportionally larger than in the large map and so is |S q |, thus leading to proportionally larger time consumptions in these particular smaller maps. The same explanation applies to the differences in time consumption between these results and those shown in Section The probability map. In the two shown smaller regions, it can be observed how the time consumption increases with the resolution and the map size but the time per pixel is similar. This shows that, in similar regions where |S q | is likely to be similar, the mean time per pixel barely changes. Whereas pixels outside the maximum range should be left to zero, the value of those map cells between two measurements could be extrapolated to fill the visual gaps by means of the Geometric approach described in Section Geometric map building.

Geometric map building
We would like to emphasize that these gaps would be rarely observable under our particular sensor configuration. For example, in order to build the image in Fig 22a a specific region where the robot was performing a turn was selected and mapped using a resolution of 5cm. Also, for the gaps to be clearly visible, the opening ϕ was set to 0.5°, six times below the actual one. Using other sensors with lower measurement rate, the gaps may appear more frequently. Fig 22b shows the same image where the gaps have been filled according to the geometric approach. As it can be observed, there is a continuity in most cases between the echo intensity map and the values assigned to gaps by the geometric approach.   High-Resolution Underwater Mapping Using Side-Scan Sonar As for the time consumption, building these two geometric maps spent 4.81 min and 1.16 min, far below the time required to build the echo intensity map corresponding to the same regions (8.87 min and 3.07 min, as shown in Table 5).

Conclusion and Future Work
Three different models to built the probabilistic layer have been experimentally tested. Among them, the Gaussian approach is the best in terms of model accuracy whilst the Uniform one surpasses the other two when considering the computation time. Also, it has been shown that for low resolution maps, the three models lead to similar results. Accordingly, it is reasonable to use the Gaussian model for high resolution maps (i.e. resolutions below the SSS one) and the Uniform or the Triangular otherwise.
The echo intensity map approach has shown to be able to provide accurate maps at better resolutions than those of the SSS by combining overlapping measurements. However, if SSS with low measurement rates are used, some gaps may appear in the map. To avoid this problem, the Geometric approach has been introduced as a method to fill such gaps.
The probabilistic layer M P and the intensity layer M I can be used together to provide the user or the navigation modules with information not only about the sea floor structure but also about how good is that information. This could be used, for example, to decide which areas have to be re-visited in order to reinforce the available information. Fig 24 summarizes this idea. This figure shows the probabilistic map, built using the Gaussian approach, and the echo intensity map of the whole explored area in our experimental setup (750m x 270m) using a resolution δ M = 30cm. In this image, red areas denote regions with high probability of being properly represented in the map and blue areas regions with lower probability.
As for future work, our research is now focused on three key points. First, to improve the presented models to deal with different observation angles. This is important as, depending on the observation angle, the projected shadows change. The second key point is to use the results of this study to perform underwater SLAM using SSS. This is a crucial task with very few related studies providing satisfactory results. Additionally, we are focusing on purely algorithmic aspects in order to reduce the time consumption of the presented approaches.