Loading [MathJax]/extensions/tex2jax.js

Attribute-Centric Compositional Text-to-Image Generation

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autorschaft

  • Yuren Cong
  • Martin Renqiang Min
  • Li Erran Li
  • Bodo Rosenhahn

Externe Organisationen

  • NEC Laboratories America, Inc.
  • Amazon.com, Inc.
  • University of Bath

Details

OriginalspracheEnglisch
Seiten (von - bis)4555-4570
Seitenumfang16
FachzeitschriftInternational Journal of Computer Vision
Jahrgang133
Ausgabenummer7
Frühes Online-Datum13 März 2025
PublikationsstatusVeröffentlicht - Juli 2025

Abstract

Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

ASJC Scopus Sachgebiete

Zitieren

Attribute-Centric Compositional Text-to-Image Generation. / Cong, Yuren; Min, Martin Renqiang; Li, Li Erran et al.
in: International Journal of Computer Vision, Jahrgang 133, Nr. 7, 07.2025, S. 4555-4570.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Cong Y, Min MR, Li LE, Rosenhahn B, Yang MY. Attribute-Centric Compositional Text-to-Image Generation. International Journal of Computer Vision. 2025 Jul;133(7):4555-4570. Epub 2025 Mär 13. doi: 10.1007/s11263-025-02371-0
Cong, Yuren ; Min, Martin Renqiang ; Li, Li Erran et al. / Attribute-Centric Compositional Text-to-Image Generation. in: International Journal of Computer Vision. 2025 ; Jahrgang 133, Nr. 7. S. 4555-4570.
Download
@article{d162e00837fa4192afc7bfe69c6f227a,
title = "Attribute-Centric Compositional Text-to-Image Generation",
abstract = "Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model{\textquoteright}s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.",
keywords = "Attribute-centric, Compositional generation, Text-to-image",
author = "Yuren Cong and Min, {Martin Renqiang} and Li, {Li Erran} and Bodo Rosenhahn and Yang, {Michael Ying}",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",
year = "2025",
month = jul,
doi = "10.1007/s11263-025-02371-0",
language = "English",
volume = "133",
pages = "4555--4570",
journal = "International Journal of Computer Vision",
issn = "0920-5691",
publisher = "Springer Netherlands",
number = "7",

}

Download

TY - JOUR

T1 - Attribute-Centric Compositional Text-to-Image Generation

AU - Cong, Yuren

AU - Min, Martin Renqiang

AU - Li, Li Erran

AU - Rosenhahn, Bodo

AU - Yang, Michael Ying

N1 - Publisher Copyright: © The Author(s) 2025.

PY - 2025/7

Y1 - 2025/7

N2 - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

AB - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

KW - Attribute-centric

KW - Compositional generation

KW - Text-to-image

UR - http://www.scopus.com/inward/record.url?scp=105000044048&partnerID=8YFLogxK

U2 - 10.1007/s11263-025-02371-0

DO - 10.1007/s11263-025-02371-0

M3 - Article

AN - SCOPUS:105000044048

VL - 133

SP - 4555

EP - 4570

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

SN - 0920-5691

IS - 7

ER -

Von denselben Autoren