Loading [MathJax]/jax/output/HTML-CSS/config.js

Attribute-Centric Compositional Text-to-Image Generation

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Yuren Cong
  • Martin Renqiang Min
  • Li Erran Li
  • Bodo Rosenhahn

Research Organisations

External Research Organisations

  • NEC Laboratories America, Inc.
  • Amazon.com, Inc.
  • University of Bath

Details

Original languageEnglish
Pages (from-to)4555-4570
Number of pages16
JournalInternational Journal of Computer Vision
Volume133
Issue number7
Early online date13 Mar 2025
Publication statusPublished - Jul 2025

Abstract

Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

Keywords

    Attribute-centric, Compositional generation, Text-to-image

ASJC Scopus subject areas

Cite this

Attribute-Centric Compositional Text-to-Image Generation. / Cong, Yuren; Min, Martin Renqiang; Li, Li Erran et al.
In: International Journal of Computer Vision, Vol. 133, No. 7, 07.2025, p. 4555-4570.

Research output: Contribution to journalArticleResearchpeer review

Cong Y, Min MR, Li LE, Rosenhahn B, Yang MY. Attribute-Centric Compositional Text-to-Image Generation. International Journal of Computer Vision. 2025 Jul;133(7):4555-4570. Epub 2025 Mar 13. doi: 10.1007/s11263-025-02371-0
Cong, Yuren ; Min, Martin Renqiang ; Li, Li Erran et al. / Attribute-Centric Compositional Text-to-Image Generation. In: International Journal of Computer Vision. 2025 ; Vol. 133, No. 7. pp. 4555-4570.
Download
@article{d162e00837fa4192afc7bfe69c6f227a,
title = "Attribute-Centric Compositional Text-to-Image Generation",
abstract = "Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model{\textquoteright}s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.",
keywords = "Attribute-centric, Compositional generation, Text-to-image",
author = "Yuren Cong and Min, {Martin Renqiang} and Li, {Li Erran} and Bodo Rosenhahn and Yang, {Michael Ying}",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",
year = "2025",
month = jul,
doi = "10.1007/s11263-025-02371-0",
language = "English",
volume = "133",
pages = "4555--4570",
journal = "International Journal of Computer Vision",
issn = "0920-5691",
publisher = "Springer Netherlands",
number = "7",

}

Download

TY - JOUR

T1 - Attribute-Centric Compositional Text-to-Image Generation

AU - Cong, Yuren

AU - Min, Martin Renqiang

AU - Li, Li Erran

AU - Rosenhahn, Bodo

AU - Yang, Michael Ying

N1 - Publisher Copyright: © The Author(s) 2025.

PY - 2025/7

Y1 - 2025/7

N2 - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

AB - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

KW - Attribute-centric

KW - Compositional generation

KW - Text-to-image

UR - http://www.scopus.com/inward/record.url?scp=105000044048&partnerID=8YFLogxK

U2 - 10.1007/s11263-025-02371-0

DO - 10.1007/s11263-025-02371-0

M3 - Article

AN - SCOPUS:105000044048

VL - 133

SP - 4555

EP - 4570

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

SN - 0920-5691

IS - 7

ER -

By the same author(s)