Learned Hybrid Video Coding for Human Perception and Multiple Machine Vision Tasks

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autorschaft

Externe Organisationen

  • National Yang Ming Chiao Tung University (NSTC)
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des Sammelwerks2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings
Herausgeber (Verlag)IEEE Computer Society
Seiten1996-2001
Seitenumfang6
ISBN (elektronisch)9798331523794
ISBN (Print)979-8-3315-2380-0
PublikationsstatusVeröffentlicht - 14 Sept. 2025
Veranstaltung32nd IEEE International Conference on Image Processing, ICIP 2025 - Anchorage, USA / Vereinigte Staaten
Dauer: 14 Sept. 202517 Sept. 2025

Publikationsreihe

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Abstract

In this work, we present a learned multi-task video codec that is optimized for human and machine vision. The codec consists of an encoder that maps images from the pixel domain to a latent representation and multiple decoders that map the latent to either an image for human consumption or multiple task-specific features for different machine vision tasks. This allows a single bitstream to be used for multiple tasks while also reducing the decoder complexity for machine vision tasks. Unlike most learned codecs, our method performs inter-coding at the latent level instead of the pixel domain. Experiments show that the proposed method achieves a compression performance for machine vision tasks comparable to other multi-task codecs designed for machine vision only, while also providing video reconstruction.

ASJC Scopus Sachgebiete

Zitieren

Learned Hybrid Video Coding for Human Perception and Multiple Machine Vision Tasks. / Benjak, Martin; Khan, Saifullah; Chen, Yi Hsin et al.
2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings. IEEE Computer Society, 2025. S. 1996-2001 (Proceedings - International Conference on Image Processing, ICIP).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Benjak, M, Khan, S, Chen, YH, Peng, WH & Ostermann, J 2025, Learned Hybrid Video Coding for Human Perception and Multiple Machine Vision Tasks. in 2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings. Proceedings - International Conference on Image Processing, ICIP, IEEE Computer Society, S. 1996-2001, 32nd IEEE International Conference on Image Processing, ICIP 2025, Anchorage, USA / Vereinigte Staaten, 14 Sept. 2025. https://doi.org/10.1109/ICIP55913.2025.11084300
Benjak, M., Khan, S., Chen, Y. H., Peng, W. H., & Ostermann, J. (2025). Learned Hybrid Video Coding for Human Perception and Multiple Machine Vision Tasks. In 2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings (S. 1996-2001). (Proceedings - International Conference on Image Processing, ICIP). IEEE Computer Society. https://doi.org/10.1109/ICIP55913.2025.11084300
Benjak M, Khan S, Chen YH, Peng WH, Ostermann J. Learned Hybrid Video Coding for Human Perception and Multiple Machine Vision Tasks. in 2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings. IEEE Computer Society. 2025. S. 1996-2001. (Proceedings - International Conference on Image Processing, ICIP). doi: 10.1109/ICIP55913.2025.11084300
Benjak, Martin ; Khan, Saifullah ; Chen, Yi Hsin et al. / Learned Hybrid Video Coding for Human Perception and Multiple Machine Vision Tasks. 2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings. IEEE Computer Society, 2025. S. 1996-2001 (Proceedings - International Conference on Image Processing, ICIP).
Download
@inproceedings{0b9a80203a764eacbfd8a5bbc1bb7dae,
title = "Learned Hybrid Video Coding for Human Perception and Multiple Machine Vision Tasks",
abstract = "In this work, we present a learned multi-task video codec that is optimized for human and machine vision. The codec consists of an encoder that maps images from the pixel domain to a latent representation and multiple decoders that map the latent to either an image for human consumption or multiple task-specific features for different machine vision tasks. This allows a single bitstream to be used for multiple tasks while also reducing the decoder complexity for machine vision tasks. Unlike most learned codecs, our method performs inter-coding at the latent level instead of the pixel domain. Experiments show that the proposed method achieves a compression performance for machine vision tasks comparable to other multi-task codecs designed for machine vision only, while also providing video reconstruction.",
keywords = "feature compression, video coding, Video coding for machines",
author = "Martin Benjak and Saifullah Khan and Chen, {Yi Hsin} and Peng, {Wen Hsiao} and J{\"o}rn Ostermann",
note = "Publisher Copyright: {\textcopyright}2025 IEEE.; 32nd IEEE International Conference on Image Processing, ICIP 2025, ICIP 2025 ; Conference date: 14-09-2025 Through 17-09-2025",
year = "2025",
month = sep,
day = "14",
doi = "10.1109/ICIP55913.2025.11084300",
language = "English",
isbn = "979-8-3315-2380-0",
series = "Proceedings - International Conference on Image Processing, ICIP",
publisher = "IEEE Computer Society",
pages = "1996--2001",
booktitle = "2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings",
address = "United States",

}

Download

TY - GEN

T1 - Learned Hybrid Video Coding for Human Perception and Multiple Machine Vision Tasks

AU - Benjak, Martin

AU - Khan, Saifullah

AU - Chen, Yi Hsin

AU - Peng, Wen Hsiao

AU - Ostermann, Jörn

N1 - Publisher Copyright: ©2025 IEEE.

PY - 2025/9/14

Y1 - 2025/9/14

N2 - In this work, we present a learned multi-task video codec that is optimized for human and machine vision. The codec consists of an encoder that maps images from the pixel domain to a latent representation and multiple decoders that map the latent to either an image for human consumption or multiple task-specific features for different machine vision tasks. This allows a single bitstream to be used for multiple tasks while also reducing the decoder complexity for machine vision tasks. Unlike most learned codecs, our method performs inter-coding at the latent level instead of the pixel domain. Experiments show that the proposed method achieves a compression performance for machine vision tasks comparable to other multi-task codecs designed for machine vision only, while also providing video reconstruction.

AB - In this work, we present a learned multi-task video codec that is optimized for human and machine vision. The codec consists of an encoder that maps images from the pixel domain to a latent representation and multiple decoders that map the latent to either an image for human consumption or multiple task-specific features for different machine vision tasks. This allows a single bitstream to be used for multiple tasks while also reducing the decoder complexity for machine vision tasks. Unlike most learned codecs, our method performs inter-coding at the latent level instead of the pixel domain. Experiments show that the proposed method achieves a compression performance for machine vision tasks comparable to other multi-task codecs designed for machine vision only, while also providing video reconstruction.

KW - feature compression

KW - video coding

KW - Video coding for machines

UR - http://www.scopus.com/inward/record.url?scp=105028627594&partnerID=8YFLogxK

U2 - 10.1109/ICIP55913.2025.11084300

DO - 10.1109/ICIP55913.2025.11084300

M3 - Conference contribution

AN - SCOPUS:105028627594

SN - 979-8-3315-2380-0

T3 - Proceedings - International Conference on Image Processing, ICIP

SP - 1996

EP - 2001

BT - 2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings

PB - IEEE Computer Society

T2 - 32nd IEEE International Conference on Image Processing, ICIP 2025

Y2 - 14 September 2025 through 17 September 2025

ER -

Von denselben Autoren