PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

Symmetry-Aware 9D Pose Estimation with Sim(3)-Consistent Feature and Spherical Inception Convolution for Robotic Picking

This project is in the review stage and will be
further improved after the paper is published.

Paper(Coming soon!) Code(Coming soon!) YouTube Video

SSH-Pose generates Rotational-symmetric or Reflectional-symmetric representations of point clouds from a single viewpoint, as shown in (a). For weakly symmetric objects like cameras, we still use symmetry processing, as their translation happens not at the lens axis but at the camera's geometric center, which is closer to its overall symmetric center plane. In (b), our MHP-Module generates more distinctive features as compared to existing methods, e.g., SecondPose, for the handle of the cup, which contains more orientation information. The concept of Spherical large-kernel inception convolution is illustrated in (c).

Abstract

Robotic picking task relies strongly on accurate object pose estimation. However, current instance-level methods for this task struggle with generalization to unseen objects. Category-level methods seek to address this; but remain constrained by low accuracy - due to the complexities of learning in the non-linear Sim(3) space and intra-class variations. We introduce an effective robotic picking technique that features two key innovations: (1) A translation and size estimator, featuring a semantic-guided symmetry-aware module that leverages robust generalization capabilities of a large vision model (LVM) to infer symmetry points, resulting in accurate translation and size without shape priors. This result serves as a prior for rotation estimation, thereby reducing the difficulty of learning in the non-linear Sim(3) space and laying a robust foundation for tackling the inherently more challenging rotation estimation. (2) A feature fusion module, based on our proposed spherical large-kernel inception convolution, fuses semantic features from the LVM with systematically computed geometric features to extract essential pose features from intra-class variations and model long-range dependencies on a spherical surface. This improves rotation estimation while avoiding heavy computational costs associated with Transformers. Built upon these innovations, we develop a robust robotic picking system capable of handling a variety of objects. Extensive experiments demonstrate that our method achieves SOTA performance on benchmark datasets and challenging real-world scenes.

Method

Overview of SSH-Pose, where the three background colors represent its three main components, and the two dashed boxes denote its two sub-modules respectively.