Details
Original language | English |
---|---|
Pages (from-to) | 4555-4570 |
Number of pages | 16 |
Journal | International Journal of Computer Vision |
Volume | 133 |
Issue number | 7 |
Early online date | 13 Mar 2025 |
Publication status | Published - Jul 2025 |
Abstract
Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.
Keywords
- Attribute-centric, Compositional generation, Text-to-image
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Computer Vision and Pattern Recognition
- Computer Science(all)
- Artificial Intelligence
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: International Journal of Computer Vision, Vol. 133, No. 7, 07.2025, p. 4555-4570.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Attribute-Centric Compositional Text-to-Image Generation
AU - Cong, Yuren
AU - Min, Martin Renqiang
AU - Li, Li Erran
AU - Rosenhahn, Bodo
AU - Yang, Michael Ying
N1 - Publisher Copyright: © The Author(s) 2025.
PY - 2025/7
Y1 - 2025/7
N2 - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.
AB - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.
KW - Attribute-centric
KW - Compositional generation
KW - Text-to-image
UR - http://www.scopus.com/inward/record.url?scp=105000044048&partnerID=8YFLogxK
U2 - 10.1007/s11263-025-02371-0
DO - 10.1007/s11263-025-02371-0
M3 - Article
AN - SCOPUS:105000044048
VL - 133
SP - 4555
EP - 4570
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
SN - 0920-5691
IS - 7
ER -