Text To Image Synthesis With Bidirectional Generative Adversarial Network

2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)(2020)

引用 19|浏览32
暂无评分
摘要
Generating realistic images from text descriptions is a challenging problem in computer vision. Although previous works have shown remarkable progress, guaranteeing semantic consistency between text descriptions and images remains challenging. To generate semantically consistent images, we propose two semantics-enhanced modules and a novel Textual-Visual Bidirectional Generative Adversarial Network (TVBi-GAN). Specifically, this paper proposes a semantics-enhanced attention module and a semantics-enhanced batch normalization module. These modules improve consistency of synthesized images by involving precisely semantic features. What's more, an encoder network is proposed to extract semantic features from images. During the adversarial process, the encoder could guide our generator to explore corresponding features behind descriptions. With extensive experiments on CUB and COCO datasets, we demonstrate that our TVBi-GAN outperforms state-of-the-art methods.
更多
查看译文
关键词
Text-to-Image Synthesis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要