Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN

Research output: Contribution to journalArticleResearchpeer review

Authors

View graph of relations

Details

Original languageEnglish
Pages (from-to)499-516
Number of pages18
JournalPFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science
Volume92
Issue number5
Early online date16 Sept 2024
Publication statusPublished - Oct 2024

Abstract

Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of cm in planimetry and cm in height for keypoints defining the car shape.

Keywords

    Autonomous driving, Object detection, Object reconstruction, Pose estimation, Shape estimation

ASJC Scopus subject areas

Cite this

Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN. / El Amrani Abouelassad, S.; Mehltretter, M.; Rottensteiner, F.
In: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, Vol. 92, No. 5, 10.2024, p. 499-516.

Research output: Contribution to journalArticleResearchpeer review

El Amrani Abouelassad, S, Mehltretter, M & Rottensteiner, F 2024, 'Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN', PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, vol. 92, no. 5, pp. 499-516. https://doi.org/10.1007/s41064-024-00311-0
El Amrani Abouelassad, S., Mehltretter, M., & Rottensteiner, F. (2024). Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN. PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 92(5), 499-516. https://doi.org/10.1007/s41064-024-00311-0
El Amrani Abouelassad S, Mehltretter M, Rottensteiner F. Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN. PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science. 2024 Oct;92(5):499-516. Epub 2024 Sept 16. doi: 10.1007/s41064-024-00311-0
El Amrani Abouelassad, S. ; Mehltretter, M. ; Rottensteiner, F. / Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN. In: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science. 2024 ; Vol. 92, No. 5. pp. 499-516.
Download
@article{7fd7714d18b148b38200326cc8a3c536,
title = "Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN",
abstract = "Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of cm in planimetry and cm in height for keypoints defining the car shape.",
keywords = "Autonomous driving, Object detection, Object reconstruction, Pose estimation, Shape estimation",
author = "{El Amrani Abouelassad}, S. and M. Mehltretter and F. Rottensteiner",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",
year = "2024",
month = oct,
doi = "10.1007/s41064-024-00311-0",
language = "English",
volume = "92",
pages = "499--516",
number = "5",

}

Download

TY - JOUR

T1 - Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN

AU - El Amrani Abouelassad, S.

AU - Mehltretter, M.

AU - Rottensteiner, F.

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/10

Y1 - 2024/10

N2 - Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of cm in planimetry and cm in height for keypoints defining the car shape.

AB - Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of cm in planimetry and cm in height for keypoints defining the car shape.

KW - Autonomous driving

KW - Object detection

KW - Object reconstruction

KW - Pose estimation

KW - Shape estimation

UR - http://www.scopus.com/inward/record.url?scp=85204012518&partnerID=8YFLogxK

U2 - 10.1007/s41064-024-00311-0

DO - 10.1007/s41064-024-00311-0

M3 - Article

AN - SCOPUS:85204012518

VL - 92

SP - 499

EP - 516

JO - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

JF - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

SN - 2512-2789

IS - 5

ER -

By the same author(s)