Fig 1.
Accuracy, number of parameters, and speed of 3D LiDAR semantic segmentation in the SemanticKITTI test set [10].
Blue circles indicate projection-based methods, green pentagons indicate image-based methods, and red squares indicate point-based methods. The total number of network parameters in millions is shown in parentheses. In comparison with previous methods, the MSCNet method proposed in this paper achieves the best trade-off between accuracy, number of parameters, and speed.
Fig 2.
The architecture of our proposed MSCNet model.
Given LiDAR Data, we first use spherical projection to get a range image(a), and SCMF block, BasicBlock, and RDC block are applied to build multi-scale feature acquisition models(b), where the dashed arrows indicate the type of supervision. Then, the pyramid parsing module is used to obtain different sub-regional representations, which are upsampled, and DCRBlock output features are concatenated to form the final feature representation, which contains both local and global information(c). Finally, the representation is fed into the Feature Fusion Module to obtain a pixel-by-pixel prediction, and a point-by-point prediction is obtained using the inverse projection(d). The different blocks are illustrated in Fig 3, and Feature Fusion Module is illustrated in Fig 4.
Fig 3.
Illustrations of SCMFBlock (a), BasicBlock (b) and DCRBlock (c).
where CBR = Conv + BN + LeakyReLU, DCBR is a CBR with a dilation rate of 2. and
denote the convolution kernel size.
Fig 4.
Architecture of the Decoder Module: (a) MobileBlock; (b) Feature Fusion Module.
Table 1.
Evaluation Results on the SemanticKITTI Test Set(Sequences 11 to 21). Point-based Methods: Rows 1 to 10. Voxel-based Methods:Rows 11 to 15. Image-based Methods: Rows 16 to 19. Projection-based Methods: Rows 20 to 31.
Table 2.
Evaluation Results on the SemanticPOSS Test Set(Sequences 02). Image-based Methods: Rows 1 to 3. Projection-based Methods: Rows 4 to 9.
Table 3.
Evaluation Results on the Pandaset Test Set. Image-based Methods: Rows 1 to 3. Projection-based Methods: Rows 4 to 7.
Fig 5.
Simple qualitative results of the MSCNet model on the SemanticKITTI benchmark(valid sequence 08).
Figures (a) and (b) show the raw data and corresponding segmentation labels of a LiDAR scan frame, respectively, and (c) shows the segmentation error map of our method for that scan frame (Red indicates incorrect predictions). The various colors indicate the different semantic classes: cars in blue, roads in purple, vegetation in green, and buildings in yellow.
Fig 6.
Qualitative analysis of the SemanticPOSS validation set.
Where (a) and (b) are the input data and the corresponding segmented real data for the LiDAR scan frame, and (c) is the segmentation error map of our method in that scan frame. (Red colour indicates incorrect predictions).
Table 4.
Effect of hyperparameter on the loss function.
Table 5.
Impact of each network module.
Table 6.
Effect of range on view resolution.
Table 7.
Multi-scale effects of the DCR block.