Learning-Based Scalable Video Coding with Spatial and Temporal Prediction

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Martin Benjak
  • Yi Hsin Chen
  • Wen Hsiao Peng
  • Jorn Ostermann

External Research Organisations

  • National Yang Ming Chiao Tung University (NSTC)
View graph of relations

Details

Original languageEnglish
Title of host publication2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (electronic)9798350359855
Publication statusPublished - 2023
Event2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 - Jeju, Korea, Republic of
Duration: 4 Dec 20237 Dec 2023

Abstract

In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.

Keywords

    conditional coding, scalable coding, spatial scalability, video coding, VVC

ASJC Scopus subject areas

Cite this

Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. / Benjak, Martin; Chen, Yi Hsin; Peng, Wen Hsiao et al.
2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc., 2023.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Benjak, M, Chen, YH, Peng, WH & Ostermann, J 2023, Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. in 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc., 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023, Jeju, Korea, Republic of, 4 Dec 2023. https://doi.org/10.1109/VCIP59821.2023.10402677
Benjak, M., Chen, Y. H., Peng, W. H., & Ostermann, J. (2023). Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. In 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/VCIP59821.2023.10402677
Benjak M, Chen YH, Peng WH, Ostermann J. Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. In 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc. 2023 doi: 10.1109/VCIP59821.2023.10402677
Benjak, Martin ; Chen, Yi Hsin ; Peng, Wen Hsiao et al. / Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc., 2023.
Download
@inproceedings{cb17cad9bb0041e59b06124bd3d414f5,
title = "Learning-Based Scalable Video Coding with Spatial and Temporal Prediction",
abstract = "In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.",
keywords = "conditional coding, scalable coding, spatial scalability, video coding, VVC",
author = "Martin Benjak and Chen, {Yi Hsin} and Peng, {Wen Hsiao} and Jorn Ostermann",
year = "2023",
doi = "10.1109/VCIP59821.2023.10402677",
language = "English",
booktitle = "2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",
note = "2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 ; Conference date: 04-12-2023 Through 07-12-2023",

}

Download

TY - GEN

T1 - Learning-Based Scalable Video Coding with Spatial and Temporal Prediction

AU - Benjak, Martin

AU - Chen, Yi Hsin

AU - Peng, Wen Hsiao

AU - Ostermann, Jorn

PY - 2023

Y1 - 2023

N2 - In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.

AB - In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.

KW - conditional coding

KW - scalable coding

KW - spatial scalability

KW - video coding

KW - VVC

UR - http://www.scopus.com/inward/record.url?scp=85184853773&partnerID=8YFLogxK

U2 - 10.1109/VCIP59821.2023.10402677

DO - 10.1109/VCIP59821.2023.10402677

M3 - Conference contribution

AN - SCOPUS:85184853773

BT - 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

Y2 - 4 December 2023 through 7 December 2023

ER -