Fig 1.
CS-robot detecting social distancing breaches.
Our robot detecting non-compliance to social distancing norms, classifying non-compliant pedestrians into groups and autonomously navigating to the static group with the most people in it (a group with 3 people in this scenario). The robot encourages the non-compliant pedestrians to move apart and maintain at least 2 meters of social distance by displaying a message on the mounted screen. Our CS-robot also captures thermal images of the scene and transmits them to appropriate security/healthcare personnel.
Fig 2.
Criteria for a social distancing breach.
(a): Our criteria used to detect whether two pedestrians violate the social distance constraint. The pedestrians are represented as circles in two different scenarios. The increasing size of the circles denotes the passage of time. The green circles represent time instants when the pedestrians maintained > 2 meters distance, and the red circles represent instants when they were closer than 2 meters. Top: Two pedestrians passing each other. This scenario is not reported as a breach since the duration of the breach is short. Bottom: Two pedestrians meeting and walking together. This scenario is reported as a breach of social distancing norms. (b): A top-down view of how non-compliant pedestrians (denoted as red circles) are classified into groups. The numbers beside the circles represent the IDs of the pedestrians outputted by Yolov3. The compliant pedestrians (green circles) are not classified into groups as the robot does not have to encourage them to maintain the appropriate social distance.
Fig 3.
Overall architecture of social distance monitoring using CS-robot: The main components include: (i) Pedestrian tracking and localization; (ii) Pairwise distance estimation between pedestrians; (iii) Classifying pedestrians into groups; (iv) Computing a goal for the robot based on whether the group is static or dynamic; (v) Using a hybrid collision avoidance method to navigate towards the goal; (vi) Displaying an alert message to the non-compliant pedestrians to encourage them to move apart; (vii) Thermal image and bounding boxes of detected people are transmitted to security/healthcare personnel.
Fig 4.
Pedestrian localization using RGB-D camera.
Left: Two pedestrians detected in the RGB image of the robot’s RGB-D camera with the bounding box centroids marked in pink and green. Right: The same bounding boxes superimposed over the depth image from the RGB-D camera. The pedestrians are localized and the distance between them is estimated by the method detailed above.
Fig 5.
Homography and coordinate frames.
a. The angled view of the homography rectangle marked in red and corners numbered from the CCTV camera. The green dots mark the points corresponding to a person’s feet in this view. b. The top view of the homography rectangle after transformation and the origin of the top view coordinate system is marked as otop. The coordinates of the feet points are also transformed using the homography matrix. c. A map of the robot’s environment with free space denoted in gray and obstacles denoted in black with a coordinate frame at origin omap. The homography rectangle is marked in red and the ground plane coordinate system is shown with the origin ognd.
Fig 6.
Thermal images generated by the thermal camera that is wirelessly transmitted to appropriate security/healthcare personnel. The temperature signatures of the people remain constant irrespective of their orientations. We intentionally have a human in the loop to monitor people’s temperature signatures, and we do not perform any form of facial recognition on people to protect their privacy. Pedestrians are detected on the thermal image to aid the personnel responsible for monitoring the area.
Table 1.
Comparison of breach detection and enforcement.
Fig 7.
Localization accuracy results.
Plots of ground truth (blue dots) versus estimated pedestrian localization (red dots) when using the robot’s RealSense camera and the static CCTV camera with more FOVs. a. The estimates from the RealSense camera tend to have slightly higher errors because we localize pedestrians using averaged proximity values within their detection bounding boxes, which is affected by the size of the bounding boxes. b. Localization using the data from the CCTV camera is more accurate as it tracks a person’s feet. This method is not affected by a person’s orientation. We observe that in both cases, the localization errors are within the acceptable range of 0.3 meters.
Table 2.
The percentage of breaches detected by the robot-CCTV hybrid setup with different numbers of walking pedestrians.
Fig 8.
Graphs of our breach detection’s computation time versus the number of detected pedestrians.
The graph shows the computation times while running our breach detection implementations for the RGB-D camera on a robot-mounted laptop, and the wall-mounted CCTV camera on a desktop (see Implementation section for specifications). The values were recorded while evaluating in scenario 4 with added human shaped cardboard cutouts to get the total number of detected pedestrians to be 13. We observe that based on the FOV and sensing regions of the two cameras, the corresponding computation times in the laptop and desktop are satisfactory.
Table 3.
Tracking duration with varying pedestrian velocities.
Fig 9.
Improved pedestrian tracking using CCTV camera.
Trajectories of two non-compliant pedestrians (in red) and the robot pursuing them (in green) in the mapped environment shown in Fig 5c. The pink and blue colors denote the static obstacles in the environment. a. The robot only uses its RGB-D camera to track the pedestrian and pursues the pedestrians successfully when they move in a smooth trajectory. b. The robot’s RGB-D camera is unable to track the pedestrians when they make a sudden sharp turn. c. When the CCTV camera is used to track the pedestrians, the robot follows their trajectories more closely. d. Pedestrians making sharp and sudden turns can also be tracked. The black line denotes the point at which the pedestrians leave the CCTV camera’s FOV, and the RGB-D camera tracks the pedestrians from this point. Sharp turns in d again become a challenge.