Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting

Yiming Xu; Hao Cheng; Monika Sester

doi:10.48550/arXiv.2402.03981

Details

Original language	English
Title of host publication	2024 IEEE Intelligent Vehicles Symposium (IV)
Pages	2397-2404
Number of pages	8
ISBN (electronic)	979-8-3503-4881-1
Publication status	Published - 6 Feb 2024

Publication series

Name	IEEE Intelligent Vehicles Symposium, Proceedings
ISSN (Print)	1931-0587
ISSN (electronic)	2642-7214

Abstract

In autonomous driving tasks, trajectory prediction in complex traffic environments requires adherence to real-world context conditions and behavior multimodalities. Existing methods predominantly rely on prior assumptions or generative models trained on curated data to learn road agents' stochastic behavior bounded by scene constraints. However, they often face mode averaging issues due to data imbalance and simplistic priors, and could even suffer from mode collapse due to unstable training and single ground truth supervision. These issues lead the existing methods to a loss of predictive diversity and adherence to the scene constraints. To address these challenges, we introduce a novel trajectory generator named Controllable Diffusion Trajectory (CDT), which integrates map information and social interactions into a Transformer-based conditional denoising diffusion model to guide the prediction of future trajectories. To ensure multimodality, we incorporate behavioral tokens to direct the trajectory's modes, such as going straight, turning right or left. Moreover, we incorporate the predicted endpoints as an alternative behavioral token into the CDT model to facilitate the prediction of accurate trajectories. Extensive experiments on the Argoverse 2 benchmark demonstrate that CDT excels in generating diverse and scene-compliant trajectories in complex urban settings.

Keywords

cs.CV

ASJC Scopus subject areas

Computer Science(all)
Computer Science Applications
Engineering(all)
Automotive Engineering
Mathematics(all)
Modelling and Simulation

Cite this

Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting. / Xu, Yiming; Cheng, Hao; Sester, Monika.
2024 IEEE Intelligent Vehicles Symposium (IV). 2024. p. 2397-2404 (IEEE Intelligent Vehicles Symposium, Proceedings).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Xu, Y, Cheng, H & Sester, M 2024, Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting. in 2024 IEEE Intelligent Vehicles Symposium (IV). IEEE Intelligent Vehicles Symposium, Proceedings, pp. 2397-2404. https://doi.org/10.48550/arXiv.2402.03981, https://doi.org/10.1109/IV55156.2024.10588486

Xu, Y., Cheng, H., & Sester, M. (2024). Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting. In 2024 IEEE Intelligent Vehicles Symposium (IV) (pp. 2397-2404). (IEEE Intelligent Vehicles Symposium, Proceedings). https://doi.org/10.48550/arXiv.2402.03981, https://doi.org/10.1109/IV55156.2024.10588486

Xu Y, Cheng H, Sester M. Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting. In 2024 IEEE Intelligent Vehicles Symposium (IV). 2024. p. 2397-2404. (IEEE Intelligent Vehicles Symposium, Proceedings). doi: 10.48550/arXiv.2402.03981, 10.1109/IV55156.2024.10588486

Xu, Yiming ; Cheng, Hao ; Sester, Monika. / Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting. 2024 IEEE Intelligent Vehicles Symposium (IV). 2024. pp. 2397-2404 (IEEE Intelligent Vehicles Symposium, Proceedings).

Download

@inproceedings{fe1c66175b504b3494eec26ea859e19e,

title = "Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting",

abstract = "In autonomous driving tasks, trajectory prediction in complex traffic environments requires adherence to real-world context conditions and behavior multimodalities. Existing methods predominantly rely on prior assumptions or generative models trained on curated data to learn road agents' stochastic behavior bounded by scene constraints. However, they often face mode averaging issues due to data imbalance and simplistic priors, and could even suffer from mode collapse due to unstable training and single ground truth supervision. These issues lead the existing methods to a loss of predictive diversity and adherence to the scene constraints. To address these challenges, we introduce a novel trajectory generator named Controllable Diffusion Trajectory (CDT), which integrates map information and social interactions into a Transformer-based conditional denoising diffusion model to guide the prediction of future trajectories. To ensure multimodality, we incorporate behavioral tokens to direct the trajectory's modes, such as going straight, turning right or left. Moreover, we incorporate the predicted endpoints as an alternative behavioral token into the CDT model to facilitate the prediction of accurate trajectories. Extensive experiments on the Argoverse 2 benchmark demonstrate that CDT excels in generating diverse and scene-compliant trajectories in complex urban settings.",

keywords = "cs.CV",

author = "Yiming Xu and Hao Cheng and Monika Sester",

year = "2024",

month = feb,

day = "6",

doi = "10.48550/arXiv.2402.03981",

language = "English",

isbn = "979-8-3503-4882-8",

series = "IEEE Intelligent Vehicles Symposium, Proceedings",

pages = "2397--2404",

booktitle = "2024 IEEE Intelligent Vehicles Symposium (IV)",

}

Download

TY - GEN

T1 - Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting

AU - Xu, Yiming

AU - Cheng, Hao

AU - Sester, Monika

PY - 2024/2/6

Y1 - 2024/2/6

N2 - In autonomous driving tasks, trajectory prediction in complex traffic environments requires adherence to real-world context conditions and behavior multimodalities. Existing methods predominantly rely on prior assumptions or generative models trained on curated data to learn road agents' stochastic behavior bounded by scene constraints. However, they often face mode averaging issues due to data imbalance and simplistic priors, and could even suffer from mode collapse due to unstable training and single ground truth supervision. These issues lead the existing methods to a loss of predictive diversity and adherence to the scene constraints. To address these challenges, we introduce a novel trajectory generator named Controllable Diffusion Trajectory (CDT), which integrates map information and social interactions into a Transformer-based conditional denoising diffusion model to guide the prediction of future trajectories. To ensure multimodality, we incorporate behavioral tokens to direct the trajectory's modes, such as going straight, turning right or left. Moreover, we incorporate the predicted endpoints as an alternative behavioral token into the CDT model to facilitate the prediction of accurate trajectories. Extensive experiments on the Argoverse 2 benchmark demonstrate that CDT excels in generating diverse and scene-compliant trajectories in complex urban settings.

AB - In autonomous driving tasks, trajectory prediction in complex traffic environments requires adherence to real-world context conditions and behavior multimodalities. Existing methods predominantly rely on prior assumptions or generative models trained on curated data to learn road agents' stochastic behavior bounded by scene constraints. However, they often face mode averaging issues due to data imbalance and simplistic priors, and could even suffer from mode collapse due to unstable training and single ground truth supervision. These issues lead the existing methods to a loss of predictive diversity and adherence to the scene constraints. To address these challenges, we introduce a novel trajectory generator named Controllable Diffusion Trajectory (CDT), which integrates map information and social interactions into a Transformer-based conditional denoising diffusion model to guide the prediction of future trajectories. To ensure multimodality, we incorporate behavioral tokens to direct the trajectory's modes, such as going straight, turning right or left. Moreover, we incorporate the predicted endpoints as an alternative behavioral token into the CDT model to facilitate the prediction of accurate trajectories. Extensive experiments on the Argoverse 2 benchmark demonstrate that CDT excels in generating diverse and scene-compliant trajectories in complex urban settings.

KW - cs.CV

UR - http://www.scopus.com/inward/record.url?scp=85199753378&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2402.03981

DO - 10.48550/arXiv.2402.03981

M3 - Conference contribution

SN - 979-8-3503-4882-8

T3 - IEEE Intelligent Vehicles Symposium, Proceedings

SP - 2397

EP - 2404

BT - 2024 IEEE Intelligent Vehicles Symposium (IV)

ER -

Research@Leibniz University

Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting

Authors

Research Organisations

Details

Publication series

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Integrated Multi-Stereo Camera System for Robust Indoor Localization with Temporal Fusion

Investigating Effects of Future Path Visualisation on Path Choices During Collision Encounters

3D Uncertain Implicit Surface Mapping Using GMM and GP

Visualising Collision Spot Uncertainty with Augmented Reality

StreamLTS: Query-Based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection

Integrated Multi-Stereo Camera System for Robust Indoor Localization with Temporal Fusion

Investigating Effects of Future Path Visualisation on Path Choices During Collision Encounters

3D Uncertain Implicit Surface Mapping Using GMM and GP

Visualising Collision Spot Uncertainty with Augmented Reality

StreamLTS: Query-Based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection

Integrated Multi-Stereo Camera System for Robust Indoor Localization with Temporal Fusion