Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 4555-4570 |
Seitenumfang | 16 |
Fachzeitschrift | International Journal of Computer Vision |
Jahrgang | 133 |
Ausgabenummer | 7 |
Frühes Online-Datum | 13 März 2025 |
Publikationsstatus | Veröffentlicht - Juli 2025 |
Abstract
Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Software
- Informatik (insg.)
- Maschinelles Sehen und Mustererkennung
- Informatik (insg.)
- Artificial intelligence
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: International Journal of Computer Vision, Jahrgang 133, Nr. 7, 07.2025, S. 4555-4570.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - Attribute-Centric Compositional Text-to-Image Generation
AU - Cong, Yuren
AU - Min, Martin Renqiang
AU - Li, Li Erran
AU - Rosenhahn, Bodo
AU - Yang, Michael Ying
N1 - Publisher Copyright: © The Author(s) 2025.
PY - 2025/7
Y1 - 2025/7
N2 - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.
AB - Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.
KW - Attribute-centric
KW - Compositional generation
KW - Text-to-image
UR - http://www.scopus.com/inward/record.url?scp=105000044048&partnerID=8YFLogxK
U2 - 10.1007/s11263-025-02371-0
DO - 10.1007/s11263-025-02371-0
M3 - Article
AN - SCOPUS:105000044048
VL - 133
SP - 4555
EP - 4570
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
SN - 0920-5691
IS - 7
ER -