StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
CoRR(2024)
摘要
To leverage LLMs for visual synthesis, traditional methods convert raster
image information into discrete grid tokens through specialized visual modules,
while disrupting the model's ability to capture the true semantic
representation of visual scenes. This paper posits that an alternative
representation of images, vector graphics, can effectively surmount this
limitation by enabling a more natural and semantically coherent segmentation of
the image information. Thus, we introduce StrokeNUWA, a pioneering work
exploring a better visual representation ”stroke tokens” on vector graphics,
which is inherently visual semantics rich, naturally compatible with LLMs, and
highly compressed. Equipped with stroke tokens, StrokeNUWA can significantly
surpass traditional LLM-based and optimization-based methods across various
metrics in the vector graphic generation task. Besides, StrokeNUWA achieves up
to a 94x speedup in inference over the speed of prior methods with an
exceptional SVG code compression ratio of 6.9
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要