Details
Original language | English |
---|---|
Pages (from-to) | 1-18 |
Number of pages | 18 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Early online date | 19 Mar 2024 |
Publication status | E-pub ahead of print - 19 Mar 2024 |
Abstract
Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.
Keywords
- cuboid fitting, Estimation, Image reconstruction, minimal solver, multi-model fitting, Scene abstraction, Shape, shape decomposition, Solid modeling, Surface reconstruction, Three-dimensional displays, Training
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Computer Vision and Pattern Recognition
- Computer Science(all)
- Computational Theory and Mathematics
- Computer Science(all)
- Artificial Intelligence
- Mathematics(all)
- Applied Mathematics
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 19.03.2024, p. 1-18.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Robust Shape Fitting for 3D Scene Abstraction
AU - Kluger, Florian
AU - Brachmann, Eric
AU - Yang, Michael Ying
AU - Rosenhahn, Bodo
N1 - Funding Information: This work was supported by the BMBF grant LeibnizAILab (01DD20003), by the DFG grant COVMAP (RO 2497/12-2), by the DFG Cluster of Excellence PhoenixD (EXC 2122), and by the Center for Digital Innovations (ZDIN).
PY - 2024/3/19
Y1 - 2024/3/19
N2 - Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.
AB - Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.
KW - cuboid fitting
KW - Estimation
KW - Image reconstruction
KW - minimal solver
KW - multi-model fitting
KW - Scene abstraction
KW - Shape
KW - shape decomposition
KW - Solid modeling
KW - Surface reconstruction
KW - Three-dimensional displays
KW - Training
UR - http://www.scopus.com/inward/record.url?scp=85188527432&partnerID=8YFLogxK
U2 - 10.48550/arXiv.2403.10452
DO - 10.48550/arXiv.2403.10452
M3 - Article
AN - SCOPUS:85188527432
SP - 1
EP - 18
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
SN - 0162-8828
ER -