LGMMFusion: A LiDAR-guided multi-modal fusion framework for enhanced 3D object detection

doi:10.1371/journal.pone.0331195

Fig 1.

Our LGMMfusion framework initially extracts features from LiDAR point clouds and camera images through their respective backbone networks.

Subsequently, the accurate depth information from the LiDAR is harnessed to guide the multi-view images in generating Image BEV representations. Ultimately, these image BEVs are integrated with the LiDAR BEVs to achieve a comprehensive fusion.

More »

Expand

Fig 2.

Multi-head multi-scale self-cross attention mechanism block.

More »

Expand

Fig 3.

Multi-head adaptive cross-attention block.

More »

Expand

Fig 4.

Within the BEV coordinate system, we construct a three-dimensional grid comprising sampling points.

The illustrations depict the projections of these 3D grid points onto the 2D image plane as viewed from various angles.

More »

Expand

Table 1.

Results on nuScenes val set. The modalities are Camera (C), LiDAR (L).

More »

Expand

Table 2.

Results on the nuScenes validation set. The results are compared across different methods using LiDAR (L) and Camera (C) modalities(Mod). The performance is evaluated both overall (mAP, NDS) and at the per-class level [75]. The classes include Car, Truck(Tru), Construction Vehicle (C.V.), Bus, Trailer(Tra), Barrier(Bar), Motorcycle (Motor), Bike, Pedestrian (Ped.), and Traffic Cone (T.C.). Small object categories include pedestrian, traffic cone, and bicycle, which show the most notable mAP improvements.

More »

Expand

Fig 5.

This figure shows an example of 3D annotation, each row represents a scene.

The first row is a tunnel during the day, and the second row is a tunnel at night.

More »

Expand

Table 3.

Extended ablation study results on the nuScenes validation set. This table compares the performance of LGMMfusion variants, including the impact of Image BEV (I-BEV), BEV Query (BEV-Q), attention structure modules (MHMS-SA and MHA-CA), and attention parameters (number of heads H and scales S). I-BEV variations involve the use of BatchNorm (BN) and ReLU activations.

More »

Expand