Loading [MathJax]/extensions/tex2jax.js

Context-Aware Layout to Image Generation with Enhanced Object Appearance

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autorschaft

  • Sen He
  • Wentong Liao
  • Michael Ying Yang
  • Yongxin Yang
  • Bodo Rosenhahn

Externe Organisationen

  • University of Surrey
  • University of Twente
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 36
  • Captures
    • Readers: 91
see details

Details

OriginalspracheEnglisch
Titel des SammelwerksProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten15044-15053
Seitenumfang10
ISBN (elektronisch)978-1-6654-4509-2
ISBN (Print)978-1-6654-4510-8
PublikationsstatusVeröffentlicht - 2021

Publikationsreihe

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919
ISSN (elektronisch)2575-7075

Abstract

A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks.

ASJC Scopus Sachgebiete

Zitieren

Context-Aware Layout to Image Generation with Enhanced Object Appearance. / He, Sen; Liao, Wentong; Yang, Michael Ying et al.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2021. S. 15044-15053 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

He, S, Liao, W, Yang, MY, Yang, Y, Song, Y-Z, Rosenhahn, B & Xiang, T 2021, Context-Aware Layout to Image Generation with Enhanced Object Appearance. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers Inc., S. 15044-15053. https://doi.org/10.1109/CVPR46437.2021.01480
He, S., Liao, W., Yang, M. Y., Yang, Y., Song, Y.-Z., Rosenhahn, B., & Xiang, T. (2021). Context-Aware Layout to Image Generation with Enhanced Object Appearance. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (S. 15044-15053). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CVPR46437.2021.01480
He S, Liao W, Yang MY, Yang Y, Song YZ, Rosenhahn B et al. Context-Aware Layout to Image Generation with Enhanced Object Appearance. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc. 2021. S. 15044-15053. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR46437.2021.01480
He, Sen ; Liao, Wentong ; Yang, Michael Ying et al. / Context-Aware Layout to Image Generation with Enhanced Object Appearance. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2021. S. 15044-15053 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).
Download
@inproceedings{0a5525459ebb47ed9a06d9bd49de027a,
title = "Context-Aware Layout to Image Generation with Enhanced Object Appearance",
abstract = "A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks. ",
keywords = "cs.CV",
author = "Sen He and Wentong Liao and Yang, {Michael Ying} and Yongxin Yang and Yi-Zhe Song and Bodo Rosenhahn and Tao Xiang",
note = "Funding Information: This work was supported by the Center for Digital In- novations (ZDIN), Federal Ministry of Education and Re- search (BMBF), Germany under the project LeibnizKILa- bor(grant no.01DD20003) and the Deutsche Forschungs- gemeinschaft (DFG) under Germany{\textquoteright}s Excellence Strategy within the Cluster of Excellence PhoenixD (EXC 2122).",
year = "2021",
doi = "10.1109/CVPR46437.2021.01480",
language = "English",
isbn = "978-1-6654-4510-8",
series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "15044--15053",
booktitle = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",
address = "United States",

}

Download

TY - GEN

T1 - Context-Aware Layout to Image Generation with Enhanced Object Appearance

AU - He, Sen

AU - Liao, Wentong

AU - Yang, Michael Ying

AU - Yang, Yongxin

AU - Song, Yi-Zhe

AU - Rosenhahn, Bodo

AU - Xiang, Tao

N1 - Funding Information: This work was supported by the Center for Digital In- novations (ZDIN), Federal Ministry of Education and Re- search (BMBF), Germany under the project LeibnizKILa- bor(grant no.01DD20003) and the Deutsche Forschungs- gemeinschaft (DFG) under Germany’s Excellence Strategy within the Cluster of Excellence PhoenixD (EXC 2122).

PY - 2021

Y1 - 2021

N2 - A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks.

AB - A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks.

KW - cs.CV

UR - http://www.scopus.com/inward/record.url?scp=85115173922&partnerID=8YFLogxK

U2 - 10.1109/CVPR46437.2021.01480

DO - 10.1109/CVPR46437.2021.01480

M3 - Conference contribution

SN - 978-1-6654-4510-8

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 15044

EP - 15053

BT - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Von denselben Autoren