SparseAlign: A Fully Sparse Framework for Cooperative Object Detection

Yunshuang Yuan; Yan Xia; Daniel Cremers; Monika Sester

doi:10.48550/arXiv.2503.12982

Details

Original language	English
Journal	CVPR
Publication status	E-pub ahead of print - 17 Mar 2025

Abstract

Cooperative perception can increase the view field and decrease the occlusion of an ego vehicle, hence improving the perception performance and safety of autonomous driving. Despite the success of previous works on cooperative object detection, they mostly operate on dense Bird's Eye View (BEV) feature maps, which are computationally demanding and can hardly be extended to long-range detection problems. More efficient fully sparse frameworks are rarely explored. In this work, we design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features. Extensive experimental results on both OPV2V and DairV2X datasets show that our framework, despite its sparsity, outperforms the state of the art with less communication bandwidth requirements. In addition, experiments on the OPV2Vt and DairV2Xt datasets for time-aligned cooperative object detection also show a significant performance gain compared to the baseline works.

Keywords

cs.CV

Cite this

SparseAlign: A Fully Sparse Framework for Cooperative Object Detection. / Yuan, Yunshuang; Xia, Yan; Cremers, Daniel et al.
In: CVPR, 17.03.2025.

Research output: Contribution to journal › Conference article › Research › peer review

Yuan, Y, Xia, Y, Cremers, D & Sester, M 2025, 'SparseAlign: A Fully Sparse Framework for Cooperative Object Detection', CVPR. https://doi.org/10.48550/arXiv.2503.12982

Yuan, Y., Xia, Y., Cremers, D., & Sester, M. (2025). SparseAlign: A Fully Sparse Framework for Cooperative Object Detection. CVPR. Advance online publication. https://doi.org/10.48550/arXiv.2503.12982

Yuan Y, Xia Y, Cremers D, Sester M. SparseAlign: A Fully Sparse Framework for Cooperative Object Detection. CVPR. 2025 Mar 17. Epub 2025 Mar 17. doi: 10.48550/arXiv.2503.12982

Yuan, Yunshuang ; Xia, Yan ; Cremers, Daniel et al. / SparseAlign : A Fully Sparse Framework for Cooperative Object Detection. In: CVPR. 2025.

Download

@article{c10f90057d224336a895bd0f17273500,

title = "SparseAlign: A Fully Sparse Framework for Cooperative Object Detection",

abstract = "Cooperative perception can increase the view field and decrease the occlusion of an ego vehicle, hence improving the perception performance and safety of autonomous driving. Despite the success of previous works on cooperative object detection, they mostly operate on dense Bird's Eye View (BEV) feature maps, which are computationally demanding and can hardly be extended to long-range detection problems. More efficient fully sparse frameworks are rarely explored. In this work, we design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features. Extensive experimental results on both OPV2V and DairV2X datasets show that our framework, despite its sparsity, outperforms the state of the art with less communication bandwidth requirements. In addition, experiments on the OPV2Vt and DairV2Xt datasets for time-aligned cooperative object detection also show a significant performance gain compared to the baseline works.",

keywords = "cs.CV",

author = "Yunshuang Yuan and Yan Xia and Daniel Cremers and Monika Sester",

note = "DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.",

year = "2025",

month = mar,

day = "17",

doi = "10.48550/arXiv.2503.12982",

language = "English",

}

Download

TY - JOUR

T1 - SparseAlign

T2 - A Fully Sparse Framework for Cooperative Object Detection

AU - Yuan, Yunshuang

AU - Xia, Yan

AU - Cremers, Daniel

AU - Sester, Monika

N1 - DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

PY - 2025/3/17

Y1 - 2025/3/17

N2 - Cooperative perception can increase the view field and decrease the occlusion of an ego vehicle, hence improving the perception performance and safety of autonomous driving. Despite the success of previous works on cooperative object detection, they mostly operate on dense Bird's Eye View (BEV) feature maps, which are computationally demanding and can hardly be extended to long-range detection problems. More efficient fully sparse frameworks are rarely explored. In this work, we design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features. Extensive experimental results on both OPV2V and DairV2X datasets show that our framework, despite its sparsity, outperforms the state of the art with less communication bandwidth requirements. In addition, experiments on the OPV2Vt and DairV2Xt datasets for time-aligned cooperative object detection also show a significant performance gain compared to the baseline works.

AB - Cooperative perception can increase the view field and decrease the occlusion of an ego vehicle, hence improving the perception performance and safety of autonomous driving. Despite the success of previous works on cooperative object detection, they mostly operate on dense Bird's Eye View (BEV) feature maps, which are computationally demanding and can hardly be extended to long-range detection problems. More efficient fully sparse frameworks are rarely explored. In this work, we design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features. Extensive experimental results on both OPV2V and DairV2X datasets show that our framework, despite its sparsity, outperforms the state of the art with less communication bandwidth requirements. In addition, experiments on the OPV2Vt and DairV2Xt datasets for time-aligned cooperative object detection also show a significant performance gain compared to the baseline works.

KW - cs.CV

U2 - 10.48550/arXiv.2503.12982

DO - 10.48550/arXiv.2503.12982

M3 - Conference article

JO - CVPR

JF - CVPR

ER -

Research@Leibniz University

SparseAlign: A Fully Sparse Framework for Cooperative Object Detection

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

Cite this

By the same author(s)

StreamLTS: Query-Based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection

Integrated Multi-Stereo Camera System for Robust Indoor Localization with Temporal Fusion

Investigating Effects of Future Path Visualisation on Path Choices During Collision Encounters

Estimating the Ride-Sharing Potential for Universities in Hanover: An Integer Programming Approach

Visualising Collision Spot Uncertainty with Augmented Reality

StreamLTS: Query-Based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection

Integrated Multi-Stereo Camera System for Robust Indoor Localization with Temporal Fusion

Investigating Effects of Future Path Visualisation on Path Choices During Collision Encounters

Estimating the Ride-Sharing Potential for Universities in Hanover: An Integer Programming Approach

Visualising Collision Spot Uncertainty with Augmented Reality

StreamLTS: Query-Based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection