Bootstrapping SparseFormers from Vision Foundation Models
Computer Vision and Pattern Recognition(2024)
Key words
Foundation Model,Computational Cost,Bootstrap Procedure,Language Model,Large-scale Models,Final Representation,Multimodal Model,Vision Transformer,Visual Understanding,Classification Model,Learning Rate,High Throughput,Input Image,Data Augmentation,Latent Space,Question Answering,Semantic Segmentation,Linear Layer,Self-supervised Learning,Pre-trained Weights,Token Embedding,Transformer Block,Learning Rate Set,Transformer Architecture,Visual Encoding,Transformer Encoder,Foreground Objects,Distillation Method
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined