Fig 1.
The GEBCO 30-arc-second gridded bathymetric dataset.
Although it covers the oceans all over the globe, its resolution is low and each sea area is sampled coarsely, as seen in the bottom right closeup.
Fig 2.
Overview of the proposed deep-learning-based image superresolution.
We use a deep neural network for superresolution that takes a low-resolution image as input and yields a high-resolution image as output, which in our case represent coarse and fine bathymetric charts, respectively. First, in the training phase, we let the network learn how to estimate the high-resolution image from the low-resolution one, using a dataset consisting of many pairs of low- and high-resolution images. This is done by minimizing a loss function, which is defined as the difference between the estimated high-resolution image and the true high-resolution image corresponding to the low-resolution one. Then, in the testing phase, we can let the network predict a desired, unknown high-resolution image from each newly-given low resolution image.
Fig 3.
Data augmentation by flipping and rotation.
The circled numbers identify image corners. There are eight possible transformations, which can be obtained by combining the flips in the left-right and up-down directions, and the rotations by 90, 180, and 270 degrees.
Fig 4.
The architecture of the ESRGAN generator [8], i.e., the deep neural network used in the proposed method.
Conv2D, LReLU, ×β and Upsampling2D denote two-dimensional convolution, leaky rectified linear unit, scaling, and two-dimensional upsampling layers, respectively. The digits on each Conv2D layer indicate its number of filter, while the kernel size is omitted here since it is three for all layers and for both horizontal and vertical axes; also, their strides were one, i.e., convolution is always performed at every pixel.s Branches with simple arrows and arrows pointing at ⊕ denote skip connections with concatenation and addition, respectively. The total number of parameters (weights and biases) is 16,732,609.
Fig 5.
⊗ and ⊕ denote multiplication and addition, respectively. Some operations are grayed out just for visibility. A patch of the input image (whose size is the same as that of the kernel) is multiplied with the kernel pixelwisely, and then summed up into a single pixel of the output image. In reality, the input image may be a single channel of a multi-channel input image; then, multiple intermediate output images are produced with different kernel channels, and then summed into a single output channel of a final output image.
Fig 6.
Leaky Rectified Linear Unit (LReLU).
The parameter α controls the amount of negative-value leaking.
Fig 7.
Two-dimensional upsampling by nearest-neighbor interpolation.
The arrows indicate copying of pixel values. While only the top-left pixel of the original image is considered here, every pixel is processed in the same manner in the actual operation.
Fig 8.
Visualized images of three testing samples as the results of superresolution.
Each row corresponds to a single testing sample, and the colorbar on the left of the row shows its value range. The columns labeled LR, Baseline, Proposed, and HR correspond to input low-resolution, output high-resolution estimated by the baseline and proposed methods, and true high-resolution images, respectively. The numbers in the estimated images (Baseline and Proposed) represent the root mean squared error (RMSE) values against the corresponding true high-resolution images (HR). Note that these values are not the average over all samples in our dataset, but the square roots of the mean squared errors over all pixels in each image (column) for each sample (row). The colorbar for each sample indicates its depth value range in meters. Note that the value ranges and the RMSE values were computed after denormalizing the images.
Fig 9.
Differences of estimated images by the baseline and proposed methods from the corresponding true high-resolution images in Fig 8.
For visibility, symmetrical logarithmic scaling is employed in coloring. The number in each image represents the corresponding RMSE values (i.e., the root mean square of its pixel values).
Fig 10.
High-resolution bathymetric map predicted by the proposed method.
Blue and red boxes contain estimated high-resolution images in the training samples (347 areas) the testing samples (87 areas), respectively.
Table 1.
Average PSNR and SSIM ± standard deviation over all testing samples.
Table 2.
Average PSNR of the proposed method over the testing samples in different depth ranges.
Table 3.
Average PSNR [dB] by the proposed method over all testing samples for scaling factor two with and without data-preprocessing techniques.
Table 4.
The average PSNR of the proposed method over the testing samples for different image sizes.