One-Shot Perception
AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT 2024
-
Towards flexible object-centric visual perception, we propose a one-shot instance-aware object keypoint (OKP) extraction approach, AnyOKP, which leverages the powerful representation ability of pretrained vision transformer (ViT), and can obtain keypoints on multiple object instances of arbitrary category after learning from a support image. An off-the-shelf petrained ViT is directly deployed for generalizable and transferable feature extraction, which is followed by training-free feature enhancement. The best-prototype pairs (BPPs) are searched for in support and query images based on appearance similarity, to yield instance-unaware candidate keypoints. Then, the entire graph with all candidate keypoints as vertices are divided to sub-graphs according to the feature distributions on the graph edges. Finally, each sub-graph represents an object instance. AnyOKP is evaluated on real object images collected with the cameras of a robot arm, a mobile robot, and a surgical robot, which not only demonstrates the cross-category flexibility and instance awareness, but also show remarkable robustness to domain shift and viewpoint change.
Contour Primitive of Interest Extraction Network Based on Dual-Metric One-Shot Learning for Vision Measurement 2022
-
Although many existing vision measurement systems have achieved high performances, they are object-specific and have limitations in flexibility. Toward intelligent vision measurement that can be conveniently reused for novel objects, this article focuses on the image geometric feature extraction with one-shot learning ability. We propose a contour primitive of interest (CPI) extraction network with dual metric (CPieNet-DM), which can obtain a designated CPI in a query image of a novel object under the guidance of only one annotated support image. First, the dual-metric learning mechanism is proposed, which not only utilizes inter-image similarity as guidance but also leverages the intra-image coherency of CPI pixels to facilitate the inference. Second, a neural network is designed to infer the CPI map based on the dual metric, which also predicts the CPI's geometric parameters. Moreover, the dual context aggregator is plugged in to provide the awareness of both images’ contexts. Third, the network training is jointly supervised by the multiple tasks of dual-metric learning, geometric parameters regression, and CPI extraction. The online hard example mining is utilized to improve the training outcome. The effectiveness of the proposed methods is validated with a series of experiments.
Surgical Perception
Semantic Segmentation in Surgery
INR for Deformation
Surgical Task Automation
Robot Pose Tracking
One-Shot Perception