Verbal-Person Nets: Pose-Guided Multi-Granularity Language-to-Person Generation

IEEE transactions on neural networks and learning systems(2023)

引用 10|浏览44
暂无评分
摘要
Person image generation conditioned on natural language allows us to personalize image editing in a user-friendly manner. This fashion, however, involves different granularities of semantic relevance between texts and visual content. Given a sentence describing an unknown person, we propose a novel pose-guided multi-granularity attention architecture to synthesize the person image in an end-to-end manner. To determine what content to draw at a global outline, the sentence-level description and pose feature maps are incorporated into a U-Net architecture to generate a coarse person image. To further enhance the fine-grained details, we propose to draw the human body parts with highly correlated textual nouns and determine the spatial positions with respect to target pose points. Our model is premised on a conditional generative adversarial network (GAN) that translates language description into a realistic person image. The proposed model is coupled with two-stream discriminators: 1) text-relevant local discriminators to improve the fine-grained appearance by identifying the region–text correspondences at the finer manipulation and 2) a global full-body discriminator to regulate the generation via a pose-weighting feature selection. Extensive experiments conducted on benchmarks validate the superiority of our method for person image generation.
更多
查看译文
关键词
Visualization,Image synthesis,Generators,Generative adversarial networks,Electronic mail,Semantics,Computer science,Fine-grained generation,generative adversarial networks (GANs),human poses,person image generation,text-to-image translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要