Robust Shape Fitting for 3D Scene Abstraction

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Florian Kluger
  • Eric Brachmann
  • Michael Ying Yang
  • Bodo Rosenhahn

Research Organisations

External Research Organisations

  • Niantic Inc.
  • University of Bath
View graph of relations

Details

Original languageEnglish
Pages (from-to)1-18
Number of pages18
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Early online date19 Mar 2024
Publication statusE-pub ahead of print - 19 Mar 2024

Abstract

Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

Keywords

    cuboid fitting, Estimation, Image reconstruction, minimal solver, multi-model fitting, Scene abstraction, Shape, shape decomposition, Solid modeling, Surface reconstruction, Three-dimensional displays, Training

ASJC Scopus subject areas

Cite this

Robust Shape Fitting for 3D Scene Abstraction. / Kluger, Florian; Brachmann, Eric; Yang, Michael Ying et al.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 19.03.2024, p. 1-18.

Research output: Contribution to journalArticleResearchpeer review

Kluger F, Brachmann E, Yang MY, Rosenhahn B. Robust Shape Fitting for 3D Scene Abstraction. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2024 Mar 19;1-18. Epub 2024 Mar 19. doi: 10.48550/arXiv.2403.10452, 10.1109/TPAMI.2024.3379014
Kluger, Florian ; Brachmann, Eric ; Yang, Michael Ying et al. / Robust Shape Fitting for 3D Scene Abstraction. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2024 ; pp. 1-18.
Download
@article{efa0adbf6c6548cda932c632436b0a2d,
title = "Robust Shape Fitting for 3D Scene Abstraction",
abstract = "Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.",
keywords = "cuboid fitting, Estimation, Image reconstruction, minimal solver, multi-model fitting, Scene abstraction, Shape, shape decomposition, Solid modeling, Surface reconstruction, Three-dimensional displays, Training",
author = "Florian Kluger and Eric Brachmann and Yang, {Michael Ying} and Bodo Rosenhahn",
note = "Funding Information: This work was supported by the BMBF grant LeibnizAILab (01DD20003), by the DFG grant COVMAP (RO 2497/12-2), by the DFG Cluster of Excellence PhoenixD (EXC 2122), and by the Center for Digital Innovations (ZDIN).",
year = "2024",
month = mar,
day = "19",
doi = "10.48550/arXiv.2403.10452",
language = "English",
pages = "1--18",
journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
issn = "0162-8828",
publisher = "IEEE Computer Society",

}

Download

TY - JOUR

T1 - Robust Shape Fitting for 3D Scene Abstraction

AU - Kluger, Florian

AU - Brachmann, Eric

AU - Yang, Michael Ying

AU - Rosenhahn, Bodo

N1 - Funding Information: This work was supported by the BMBF grant LeibnizAILab (01DD20003), by the DFG grant COVMAP (RO 2497/12-2), by the DFG Cluster of Excellence PhoenixD (EXC 2122), and by the Center for Digital Innovations (ZDIN).

PY - 2024/3/19

Y1 - 2024/3/19

N2 - Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

AB - Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

KW - cuboid fitting

KW - Estimation

KW - Image reconstruction

KW - minimal solver

KW - multi-model fitting

KW - Scene abstraction

KW - Shape

KW - shape decomposition

KW - Solid modeling

KW - Surface reconstruction

KW - Three-dimensional displays

KW - Training

UR - http://www.scopus.com/inward/record.url?scp=85188527432&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2403.10452

DO - 10.48550/arXiv.2403.10452

M3 - Article

AN - SCOPUS:85188527432

SP - 1

EP - 18

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

SN - 0162-8828

ER -