SAOR: Single-View Articulated Object Reconstruction

Mehmet Aygun
Oisin Mac Aodha

University of Edinburgh
Arxiv, 2023


We introduce SAOR, a novel approach for estimating the 3D shape, texture, and viewpoint of an articulated object from a single image captured in the wild. Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons, SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors. Our method only requires estimated object silhouettes and relative depth maps from off-the-shelf pre-trained networks during training. At inference time, given a single-view image, it efficiently outputs an explicit mesh representation.

Singe Frame Reconstuction Results

SAOR learns to estimate 3D shape of highly articulated objects from single images without using any 3D object prior (e.g. a 3D template or skeleton) during training. It also estimates texture, viewpoint, and a 3D articulated part segmentation, all learned in a self-supervised manner.

Per-Frame Predictions

SAOR robustly estimate 3D shape, viewpoint, and texture of objects using per-frame input without using any temporal information.

Interactive Single View Reconstructions

Articulation Transfer

SAOR learns to disentangle shape deformation and articulation during training. Given a source image (left) we can transfer articulation from another image (right) via interpolating articulation features, and obtain reasonable articulation transfer (middle).