Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
International Joint Conference on Artificial Intelligence(2024)
Key words
Computer Vision -> CV: Vision, language and reasoning,Computer Vision -> CV: Multimodal learning,Machine Learning -> ML: Multi-modal learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined