Semantic segmentation of mobile mapping point clouds via multi-view label transfer

Torben Peters; Claus Brenner; Konrad Schindler

doi:10.1016/j.isprsjprs.2023.05.018

Details

Original language	English
Pages (from-to)	30-39
Number of pages	10
Journal	ISPRS Journal of Photogrammetry and Remote Sensing
Volume	202
Early online date	8 Jun 2023
Publication status	Published - Aug 2023

Abstract

We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.

Keywords

3D point clouds, Convolutional neural network (CNN), Label transfer, Multi-view, Semantic segmentation

ASJC Scopus subject areas

Physics and Astronomy(all)
Atomic and Molecular Physics, and Optics
Engineering(all)
Engineering (miscellaneous)
Computer Science(all)
Computer Science Applications
Earth and Planetary Sciences(all)
Computers in Earth Sciences

Cite this

Semantic segmentation of mobile mapping point clouds via multi-view label transfer. / Peters, Torben; Brenner, Claus; Schindler, Konrad.
In: ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 202, 08.2023, p. 30-39.

Research output: Contribution to journal › Article › Research › peer review

Peters, T, Brenner, C & Schindler, K 2023, 'Semantic segmentation of mobile mapping point clouds via multi-view label transfer', ISPRS Journal of Photogrammetry and Remote Sensing, vol. 202, pp. 30-39. https://doi.org/10.1016/j.isprsjprs.2023.05.018

Peters, T., Brenner, C., & Schindler, K. (2023). Semantic segmentation of mobile mapping point clouds via multi-view label transfer. ISPRS Journal of Photogrammetry and Remote Sensing, 202, 30-39. https://doi.org/10.1016/j.isprsjprs.2023.05.018

Peters T, Brenner C, Schindler K. Semantic segmentation of mobile mapping point clouds via multi-view label transfer. ISPRS Journal of Photogrammetry and Remote Sensing. 2023 Aug;202:30-39. Epub 2023 Jun 8. doi: 10.1016/j.isprsjprs.2023.05.018

Peters, Torben ; Brenner, Claus ; Schindler, Konrad. / Semantic segmentation of mobile mapping point clouds via multi-view label transfer. In: ISPRS Journal of Photogrammetry and Remote Sensing. 2023 ; Vol. 202. pp. 30-39.

Download

@article{4048898567a94bf6b882d2e0469b06c9,

title = "Semantic segmentation of mobile mapping point clouds via multi-view label transfer",

abstract = "We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.",

keywords = "3D point clouds, Convolutional neural network (CNN), Label transfer, Multi-view, Semantic segmentation",

author = "Torben Peters and Claus Brenner and Konrad Schindler",

note = "Funding Information: Part of the research was made within the Research Training Group GRK2159, {\textquoteleft}Integrity and collaboration in dynamic sensor networks{\textquoteright} (i.c.sens) which is funded by the German Research Foundation (DFG) .",

year = "2023",

month = aug,

doi = "10.1016/j.isprsjprs.2023.05.018",

language = "English",

volume = "202",

pages = "30--39",

journal = "ISPRS Journal of Photogrammetry and Remote Sensing",

issn = "0924-2716",

publisher = "Elsevier BV",

}

Download

TY - JOUR

T1 - Semantic segmentation of mobile mapping point clouds via multi-view label transfer

AU - Peters, Torben

AU - Brenner, Claus

AU - Schindler, Konrad

N1 - Funding Information: Part of the research was made within the Research Training Group GRK2159, ‘Integrity and collaboration in dynamic sensor networks’ (i.c.sens) which is funded by the German Research Foundation (DFG) .

PY - 2023/8

Y1 - 2023/8

N2 - We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.

AB - We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.

KW - 3D point clouds

KW - Convolutional neural network (CNN)

KW - Label transfer

KW - Multi-view

KW - Semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85161691263&partnerID=8YFLogxK

U2 - 10.1016/j.isprsjprs.2023.05.018

DO - 10.1016/j.isprsjprs.2023.05.018

M3 - Article

AN - SCOPUS:85161691263

VL - 202

SP - 30

EP - 39

JO - ISPRS Journal of Photogrammetry and Remote Sensing

JF - ISPRS Journal of Photogrammetry and Remote Sensing

SN - 0924-2716

ER -

Research@Leibniz University

Semantic segmentation of mobile mapping point clouds via multi-view label transfer

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this