Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

The comparison between bounding box detection and point detection on the Tower dataset collected from practical applications.

The upper image shows the performance of the YOLOv7 bounding box detection method, while the lower image presents the results of our point-based detection method. As we can see, the point-based detection method performs better. All faces are blurred in Fig 1 for privacy preservation.

More »

Fig 1 Expand

Fig 2.

Overall framework of point-based crowd counting with contrastive learning.

The blue dashed box includes four different-sized feature layers from the VGG backbone network. The backbone network section can be replaced with other structures such as ResNet. The dashed box containing the projection head, contrastive loss, and point matching is used only during the training process.

More »

Fig 2 Expand

Fig 3.

Structure of the multi-scale feature fusion module.

L, M, and H represent Low-level, Medium-level, and High-level features, respectively. W1, W2, and W3 are learnable weights for features at different levels. The symbol Σ denotes the element-wise weighted summation operation.

More »

Fig 3 Expand

Table 1.

The overall performance of our framework.

More »

Table 1 Expand

Fig 4.

Visualization of our method on the ShanghaiTech Part A and Part B datasets.

The left image is from ShanghaiTech Part A, with a predicted crowd count of 375; the right image is from Part B, with a predicted crowd count of 18. All faces are blurred in Fig 4 for privacy preservation.

More »

Fig 4 Expand

Fig 5.

Example visualization results of our method on the Tower dataset.

The left image shows a close-up view of a high-speed rail station exit, with a predicted crowd count of 2; the right image shows a distant view of a street, with a predicted crowd count of 33. All faces are blurred in Fig 5 for privacy preservation.

More »

Fig 5 Expand

Table 2.

Ablation study on projection head.

More »

Table 2 Expand

Table 3.

Evaluation of the effectiveness of MSFM.

More »

Table 3 Expand

Table 4.

Evaluation of the effectiveness of contrastive loss.

More »

Table 4 Expand

Fig 6.

Evaluation of MAE accuracy metric based on different patch numbers for our method on the SHTechA crowd counting dataset.

Patch number parameter is the number of samples cropped from a single image for contrastive learning.

More »

Fig 6 Expand

Fig 7.

The t-SNE map of MSFM features by different models, in which the blue dots and red dots refer to the positive samples and negative samples.

Left: features after the MSFM model integrated with contrastive learning; Right: features of the baseline model.

More »

Fig 7 Expand

Table 5.

Comparison of the Parameters (M) and Inference speed (s/100 images).

More »

Table 5 Expand