Attribute-Centric Compositional Text-to-Image Generation

Yuren Cong; Martin Renqiang Min; Li Erran Li; Bodo Rosenhahn; Michael Ying Yang

doi:10.1007/s11263-025-02371-0

Details

Originalsprache	Englisch
Seiten (von - bis)	4555-4570
Seitenumfang	16
Fachzeitschrift	International Journal of Computer Vision
Jahrgang	133
Ausgabenummer	7
Frühes Online-Datum	13 März 2025
Publikationsstatus	Veröffentlicht - Juli 2025

Abstract

Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Informatik (insg.)
Maschinelles Sehen und Mustererkennung
Informatik (insg.)
Artificial intelligence

Zitieren

Attribute-Centric Compositional Text-to-Image Generation. / Cong, Yuren; Min, Martin Renqiang; Li, Li Erran et al.
in: International Journal of Computer Vision, Jahrgang 133, Nr. 7, 07.2025, S. 4555-4570.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Cong, Y, Min, MR, Li, LE, Rosenhahn, B & Yang, MY 2025, 'Attribute-Centric Compositional Text-to-Image Generation', International Journal of Computer Vision, Jg. 133, Nr. 7, S. 4555-4570. https://doi.org/10.1007/s11263-025-02371-0

Cong, Y., Min, M. R., Li, L. E., Rosenhahn, B., & Yang, M. Y. (2025). Attribute-Centric Compositional Text-to-Image Generation. International Journal of Computer Vision, 133(7), 4555-4570. https://doi.org/10.1007/s11263-025-02371-0

Cong Y, Min MR, Li LE, Rosenhahn B, Yang MY. Attribute-Centric Compositional Text-to-Image Generation. International Journal of Computer Vision. 2025 Jul;133(7):4555-4570. Epub 2025 Mär 13. doi: 10.1007/s11263-025-02371-0

Cong, Yuren ; Min, Martin Renqiang ; Li, Li Erran et al. / Attribute-Centric Compositional Text-to-Image Generation. in: International Journal of Computer Vision. 2025 ; Jahrgang 133, Nr. 7. S. 4555-4570.

Download

@article{d162e00837fa4192afc7bfe69c6f227a,

title = "Attribute-Centric Compositional Text-to-Image Generation",

abstract = "Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model{\textquoteright}s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.",

keywords = "Attribute-centric, Compositional generation, Text-to-image",

author = "Yuren Cong and Min, {Martin Renqiang} and Li, {Li Erran} and Bodo Rosenhahn and Yang, {Michael Ying}",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",

year = "2025",

month = jul,

doi = "10.1007/s11263-025-02371-0",

language = "English",

volume = "133",

pages = "4555--4570",

journal = "International Journal of Computer Vision",

issn = "0920-5691",

publisher = "Springer Netherlands",

number = "7",

}

Download

TY - JOUR

T1 - Attribute-Centric Compositional Text-to-Image Generation

AU - Cong, Yuren

AU - Min, Martin Renqiang

AU - Li, Li Erran

AU - Rosenhahn, Bodo

AU - Yang, Michael Ying

N1 - Publisher Copyright: © The Author(s) 2025.

PY - 2025/7

Y1 - 2025/7

N2 - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

AB - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.

KW - Attribute-centric

KW - Compositional generation

KW - Text-to-image

UR - http://www.scopus.com/inward/record.url?scp=105000044048&partnerID=8YFLogxK

U2 - 10.1007/s11263-025-02371-0

DO - 10.1007/s11263-025-02371-0

M3 - Article

AN - SCOPUS:105000044048

VL - 133

SP - 4555

EP - 4570

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

SN - 0920-5691

IS - 7

ER -

Research@Leibniz University

Attribute-Centric Compositional Text-to-Image Generation

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents

Guest Editorial: Special Issue on Multimodal Learning

PanoSCU: A Simulation-Based Dataset for Panoramic Indoor Scene Understanding

Automl for Multi-Class Anomaly Compensation of Sensor Drift

CHOTA: A Higher Order Accuracy Metric for Cell Tracking

Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents

Guest Editorial: Special Issue on Multimodal Learning

PanoSCU: A Simulation-Based Dataset for Panoramic Indoor Scene Understanding

Automl for Multi-Class Anomaly Compensation of Sensor Drift

CHOTA: A Higher Order Accuracy Metric for Cell Tracking

Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents