Attribute-Centric Compositional Text-to-Image Generation

Yuren Cong; Martin Renqiang Min; Li Erran Li; Bodo Rosenhahn; Michael Ying Yang

doi:10.1007/s11263-025-02371-0

Details

Original language	English
Pages (from-to)	4555-4570
Number of pages	16
Journal	International Journal of Computer Vision
Volume	133
Issue number	7
Early online date	13 Mar 2025
Publication status	Published - Jul 2025

Abstract

Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

Keywords

Attribute-centric, Compositional generation, Text-to-image

ASJC Scopus subject areas

Computer Science(all)
Software
Computer Science(all)
Computer Vision and Pattern Recognition
Computer Science(all)
Artificial Intelligence

Cite this

Attribute-Centric Compositional Text-to-Image Generation. / Cong, Yuren; Min, Martin Renqiang; Li, Li Erran et al.
In: International Journal of Computer Vision, Vol. 133, No. 7, 07.2025, p. 4555-4570.

Research output: Contribution to journal › Article › Research › peer review

Cong, Y, Min, MR, Li, LE, Rosenhahn, B & Yang, MY 2025, 'Attribute-Centric Compositional Text-to-Image Generation', International Journal of Computer Vision, vol. 133, no. 7, pp. 4555-4570. https://doi.org/10.1007/s11263-025-02371-0

Cong, Y., Min, M. R., Li, L. E., Rosenhahn, B., & Yang, M. Y. (2025). Attribute-Centric Compositional Text-to-Image Generation. International Journal of Computer Vision, 133(7), 4555-4570. https://doi.org/10.1007/s11263-025-02371-0

Cong Y, Min MR, Li LE, Rosenhahn B, Yang MY. Attribute-Centric Compositional Text-to-Image Generation. International Journal of Computer Vision. 2025 Jul;133(7):4555-4570. Epub 2025 Mar 13. doi: 10.1007/s11263-025-02371-0

Cong, Yuren ; Min, Martin Renqiang ; Li, Li Erran et al. / Attribute-Centric Compositional Text-to-Image Generation. In: International Journal of Computer Vision. 2025 ; Vol. 133, No. 7. pp. 4555-4570.

Download

@article{d162e00837fa4192afc7bfe69c6f227a,

title = "Attribute-Centric Compositional Text-to-Image Generation",

abstract = "Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model{\textquoteright}s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.",

keywords = "Attribute-centric, Compositional generation, Text-to-image",

author = "Yuren Cong and Min, {Martin Renqiang} and Li, {Li Erran} and Bodo Rosenhahn and Yang, {Michael Ying}",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",

year = "2025",

month = jul,

doi = "10.1007/s11263-025-02371-0",

language = "English",

volume = "133",

pages = "4555--4570",

journal = "International Journal of Computer Vision",

issn = "0920-5691",

publisher = "Springer Netherlands",

number = "7",

}

Download

TY - JOUR

T1 - Attribute-Centric Compositional Text-to-Image Generation

AU - Cong, Yuren

AU - Min, Martin Renqiang

AU - Li, Li Erran

AU - Rosenhahn, Bodo

AU - Yang, Michael Ying

PY - 2025/7

Y1 - 2025/7

N2 - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

AB - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

KW - Attribute-centric

KW - Compositional generation

KW - Text-to-image

UR - http://www.scopus.com/inward/record.url?scp=105000044048&partnerID=8YFLogxK

U2 - 10.1007/s11263-025-02371-0

DO - 10.1007/s11263-025-02371-0

M3 - Article

AN - SCOPUS:105000044048

VL - 133

SP - 4555

EP - 4570

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

SN - 0920-5691

IS - 7

ER -

Research@Leibniz University

Attribute-Centric Compositional Text-to-Image Generation

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents

Guest Editorial: Special Issue on Multimodal Learning

PanoSCU: A Simulation-Based Dataset for Panoramic Indoor Scene Understanding

Automl for Multi-Class Anomaly Compensation of Sensor Drift

CHOTA: A Higher Order Accuracy Metric for Cell Tracking

Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents

Guest Editorial: Special Issue on Multimodal Learning

PanoSCU: A Simulation-Based Dataset for Panoramic Indoor Scene Understanding

Automl for Multi-Class Anomaly Compensation of Sensor Drift

CHOTA: A Higher Order Accuracy Metric for Cell Tracking

Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents