StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS – Technology Org

Transparent objects are common in our everyday lives, but robots have difficulties with the pose estimation of these objects.

It is challenging to acquire high-quality depth maps of transparent objects with commonly used depth sensors and RGB data often exhibits severe content aliasing caused by the transparent material.

Transparent sphere - illustrative photo.  Image credit: Pxhere, CC0 Public Domain

Transparent sphere – illustrative photo. Image credit: Pxhere, CC0 Public Domain

A recent paper on arXiv.org presents StereoPose, a novel stereo image framework for category-level 6D transparent object pose estimation.

The novel approach exploits stereo images to implicitly model the object shape information instead of explicitly using the object point cloud. Researchers define the back-view in normalized object coordinate space (NOCS) map for the transparent objects. It reduces the negative effect of image content aliasing on transparent object pose estimation.

Extensive experiments show that Stereo-Pose achieves dramatic performance improvements over other existing methods.

Most existing methods for category-level pose estimation rely on object point clouds. However, when considering transparent objects, depth cameras are usually not able to capture meaningful data, resulting in point clouds with severe artifacts. Without a high-quality point cloud, existing methods are not applicable to challenging transparent objects. To tackle this problem, we present StereoPose, a novel stereo image framework for category-level object pose estimation, ideally suited for transparent objects. For a robust estimation from pure stereo images, we develop a pipeline that decouples category-level pose estimation into object size estimation, initial pose estimation, and pose refinement. StereoPose then estimates object pose based on representation in the normalized object coordinate space~ (NOCS). To address the issue of image content aliasing, we further define a back-view NOCS map for the transparent object. The back-view NOCS aims to reduce the network learning ambiguity caused by content aliasing, and leverage informative cues on the back of the transparent object for more accurate pose estimation. To further improve the performance of the stereo framework, StereoPose is equipped with a parallax attention module for stereo feature fusion and an epipolar loss for improving the stereo-view consistency of network predictions. Extensive experiments on the public TOD dataset demonstrating the superiority of the proposed StereoPose framework for category-level 6D transparent object pose estimation.

Research article: Chen, K., James, S., Sui, C., Liu, Y.-H., Abbeel, P., and Dou, Q., “StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back -View NOCS”, 2022. Link: https://arxiv.org/abs/2211.01644
Project page: https://appsrv.cse.cuhk.edu.hk/~kaichen/stereopose.html


Leave a Reply

Your email address will not be published. Required fields are marked *