TextCraft: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Text

Aditya Sanghi,Rao Fu,Vivian Liu,Karl D. D. Willis,Hooman Shayani,Amir Hosein Khasahmadi,Srinath Sridhar,Daniel Ritchie

arXiv (Cornell University)（2022）

引用 0|浏览8

暂无评分

摘要

Language is one of the primary means by which we describe the 3D world around us. While rapid progress has been made in text-to-2D-image synthesis, similar progress in text-to-3D-shape synthesis has been hindered by the lack of paired (text, shape) data. Moreover, extant methods for text-to-shape generation have limited shape diversity and fidelity. We introduce TextCraft, a method to address these limitations by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs for training. TextCraft achieves this by using CLIP and using a multi-resolution approach by first generating in a low-dimensional latent space and then upscaling to a higher resolution, improving the fidelity of the generated shape. To improve shape diversity, we use a discrete latent space which is modelled using a bidirectional transformer conditioned on the interchangeable image-text embedding space induced by CLIP. Moreover, we present a novel variant of classifier-free guidance, which further improves the accuracy-diversity trade-off. Finally, we perform extensive experiments that demonstrate that TextCraft outperforms state-of-the-art baselines.

查看译文

关键词

diverse shapes,textcraft,generation,zero-shot,high-fidelity

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要