Learning-Based Scalable Video Coding with Spatial and Temporal Prediction

Martin Benjak; Yi Hsin Chen; Wen Hsiao Peng; Jorn Ostermann

doi:10.1109/VCIP59821.2023.10402677

Details

Original language	English
Title of host publication	2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (electronic)	9798350359855
Publication status	Published - 2023
Event	2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 - Jeju, Korea, Republic of Duration: 4 Dec 2023 → 7 Dec 2023

Abstract

In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.

Keywords

conditional coding, scalable coding, spatial scalability, video coding, VVC

ASJC Scopus subject areas

Computer Science(all)
Computer Networks and Communications
Computer Science(all)
Computer Vision and Pattern Recognition
Computer Science(all)
Hardware and Architecture
Computer Science(all)
Signal Processing
Engineering(all)
Media Technology

Cite this

Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. / Benjak, Martin; Chen, Yi Hsin; Peng, Wen Hsiao et al.
2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc., 2023.

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Benjak, M, Chen, YH, Peng, WH & Ostermann, J 2023, Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. in 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc., 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023, Jeju, Korea, Republic of, 4 Dec 2023. https://doi.org/10.1109/VCIP59821.2023.10402677

Benjak, M., Chen, Y. H., Peng, W. H., & Ostermann, J. (2023). Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. In 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/VCIP59821.2023.10402677

Benjak M, Chen YH, Peng WH, Ostermann J. Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. In 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc. 2023 doi: 10.1109/VCIP59821.2023.10402677

Benjak, Martin ; Chen, Yi Hsin ; Peng, Wen Hsiao et al. / Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc., 2023.

Download

@inproceedings{cb17cad9bb0041e59b06124bd3d414f5,

title = "Learning-Based Scalable Video Coding with Spatial and Temporal Prediction",

abstract = "In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.",

keywords = "conditional coding, scalable coding, spatial scalability, video coding, VVC",

author = "Martin Benjak and Chen, {Yi Hsin} and Peng, {Wen Hsiao} and Jorn Ostermann",

year = "2023",

doi = "10.1109/VCIP59821.2023.10402677",

language = "English",

booktitle = "2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

address = "United States",

note = "2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 ; Conference date: 04-12-2023 Through 07-12-2023",

}

Download

TY - GEN

T1 - Learning-Based Scalable Video Coding with Spatial and Temporal Prediction

AU - Benjak, Martin

AU - Chen, Yi Hsin

AU - Peng, Wen Hsiao

AU - Ostermann, Jorn

PY - 2023

Y1 - 2023

N2 - In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.

AB - In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.

KW - conditional coding

KW - scalable coding

KW - spatial scalability

KW - video coding

KW - VVC

UR - http://www.scopus.com/inward/record.url?scp=85184853773&partnerID=8YFLogxK

U2 - 10.1109/VCIP59821.2023.10402677

DO - 10.1109/VCIP59821.2023.10402677

M3 - Conference contribution

AN - SCOPUS:85184853773

BT - 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

Y2 - 4 December 2023 through 7 December 2023

ER -

Research@Leibniz University

Learning-Based Scalable Video Coding with Spatial and Temporal Prediction

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Inverse design of robust out-of-plane coupling elements

Wire Break Detection in Hybrid Towers of Wind Turbines: A Novel Application to Monitor Tendons Using Acoustic Emission Analysis

Quantized Inverse Design for Photonic Integrated Circuits

Towards Automatic Bias Analysis in Multimedia Journalism

A flexible framework for large-scale FDTD simulations: open-source inverse design for 3D nanostructures

Inverse design of robust out-of-plane coupling elements

Wire Break Detection in Hybrid Towers of Wind Turbines: A Novel Application to Monitor Tendons Using Acoustic Emission Analysis

Quantized Inverse Design for Photonic Integrated Circuits

Towards Automatic Bias Analysis in Multimedia Journalism

A flexible framework for large-scale FDTD simulations: open-source inverse design for 3D nanostructures

Inverse design of robust out-of-plane coupling elements