MSCNet: Efficient and accurate semantic segmentation of LiDAR data using Multi-scale Convolution

doi:10.1371/journal.pone.0345761

Fig 1.

Accuracy, number of parameters, and speed of 3D LiDAR semantic segmentation in the SemanticKITTI test set [10].

Blue circles indicate projection-based methods, green pentagons indicate image-based methods, and red squares indicate point-based methods. The total number of network parameters in millions is shown in parentheses. In comparison with previous methods, the MSCNet method proposed in this paper achieves the best trade-off between accuracy, number of parameters, and speed.

More »

Expand

Fig 2.

The architecture of our proposed MSCNet model.

Given LiDAR Data, we first use spherical projection to get a range image(a), and SCMF block, BasicBlock, and RDC block are applied to build multi-scale feature acquisition models(b), where the dashed arrows indicate the type of supervision. Then, the pyramid parsing module is used to obtain different sub-regional representations, which are upsampled, and DCRBlock output features are concatenated to form the final feature representation, which contains both local and global information(c). Finally, the representation is fed into the Feature Fusion Module to obtain a pixel-by-pixel prediction, and a point-by-point prediction is obtained using the inverse projection(d). The different blocks are illustrated in Fig 3, and Feature Fusion Module is illustrated in Fig 4.

More »

Expand

Fig 3.

Illustrations of SCMFBlock (a), BasicBlock (b) and DCRBlock (c).

where CBR = Conv + BN + LeakyReLU, DCBR is a CBR with a dilation rate of 2. and denote the convolution kernel size.

More »

Expand

Fig 4.

Architecture of the Decoder Module: (a) MobileBlock; (b) Feature Fusion Module.

More »

Expand

Table 1.

Evaluation Results on the SemanticKITTI Test Set(Sequences 11 to 21). Point-based Methods: Rows 1 to 10. Voxel-based Methods:Rows 11 to 15. Image-based Methods: Rows 16 to 19. Projection-based Methods: Rows 20 to 31.

More »

Expand

Table 2.

Evaluation Results on the SemanticPOSS Test Set(Sequences 02). Image-based Methods: Rows 1 to 3. Projection-based Methods: Rows 4 to 9.

More »

Expand

Table 3.

Evaluation Results on the Pandaset Test Set. Image-based Methods: Rows 1 to 3. Projection-based Methods: Rows 4 to 7.

More »

Expand

Fig 5.

Simple qualitative results of the MSCNet model on the SemanticKITTI benchmark(valid sequence 08).

Figures (a) and (b) show the raw data and corresponding segmentation labels of a LiDAR scan frame, respectively, and (c) shows the segmentation error map of our method for that scan frame (Red indicates incorrect predictions). The various colors indicate the different semantic classes: cars in blue, roads in purple, vegetation in green, and buildings in yellow.

More »

Expand

Fig 6.

Qualitative analysis of the SemanticPOSS validation set.

Where (a) and (b) are the input data and the corresponding segmented real data for the LiDAR scan frame, and (c) is the segmentation error map of our method in that scan frame. (Red colour indicates incorrect predictions).

More »