InDecGAN: Learning to Generate Complex Images From Captions via Independent Object-Level Decomposition and Enhancement

IEEE TRANSACTIONS ON MULTIMEDIA(2023)

Cited 0|Views63
No score
Abstract
Text-to-image synthesis is a challenging problem, in which a complex scene contains diverse objects of various sizes and sub-images of objects belonging to the same class have diverse forms from different perspectives. Thus, synthesis models have difficulty in capturing varied objects in the complex scene. To alleviate these problems, we devise an independent object-level decomposing and enhancing generative adversarial networks, denoted as InDecGAN, to synthesize complex images and capture varied objects in a complex scene. Specifically, InDecGAN fully utilizes the independent object-level information, bounding boxes and high-resolution images of objects in training, by employing independent object-level pathways to synthesize varied objects. The independent object-level pathway integrates an independent object-level adversarial loss and the bounding box information to learn the visual features of objects independently, then, the main pathway exploits the features provided by the object-level pathway to compose the full scene and synthesize images. In addition, we analyze the generalization properties of the proposed InDecGAN and demonstrate the improvement from the perspective of the model architecture. Moreover, extensive experiments conducted on a widely used dataset are presented to demonstrate that the proposed model with an independent object-level pathway produces synthesized images of significantly improved quality.
More
Translated text
Key words
Layout,Task analysis,Generators,Shape,Semantics,Generative adversarial networks,Image synthesis,Complex scene,independent object-level pathway,size information,text-to-image synthesis
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined