Plugging Stylized Controls in Open-Stylized Image Captioning

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I(2024)

引用 0|浏览0
暂无评分
摘要
Image captioning is a classical multi-modal task for visionlanguage understanding. In recent years, researchers have begun to focus on generating captions with personalized styles, but the range of available styles is often fixed. The existing methods for Stylized Image Captioning Generation are mainly done by reinforcement learning or contrastive learning. Even with the assistance of large models such as CLIP and GPT, previous methods still require fine-tuning to generate targeted style captions and these methods necessitate a certain amount of computational resources and training costs. In this paper, we design a Plug-in Stylized Controls Module (PSCM), which can be directly inserted into the text-generation procedure of a well-trained model to generate open-stylized captions. Specifically, PSCM uses the style factor and fluency factor to guide the text-generation decoder. The style factor helps to generate text in a specified style, while the fluency factor helps to improve the fluency of generated texts. PSCM is a straightforward yet effective plug-and-play module that can readily produce open-stylized captions for each image without fine-tuning of backbones and text-image paired training data in target style. We add PSCM to two existing Stylized Image Captioning models and conduct experiments on four datasets to demonstrate the effectiveness of this module.
更多
查看译文
关键词
open-stylized image captioning,language modelling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要