MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling
arXiv (Cornell University)(2021)
Key words
masked vision-and-language,modeling,pre-training
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined