Visual Transformer论文集收录了google Transformer在计算机视觉上的理论和应用论文
Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, Christoph Feichtenhofer
In addition to Multi-Object Tracking Accuracy and Identity F1 Score, we report the following additional CLEAR multi-object tracking metrics: MT: Ground truth tracks covered for at least 80%
Cited by0BibtexViews8DOI
0
0
Yifan Xu, Weijian Xu, David Cheung, Zhuowen Tu
We presented LinE segment TRansformers, a line segment detector based on multi-scale encoder/decoder transformer structure
Cited by0BibtexViews7DOI
0
0
We provide an analysis on open research directions and possible future works
Cited by0BibtexViews19DOI
0
0
We propose VisualSparta, a simple yet effective text-to-image retrieval model that performs better than all existing retrieval models on both accuracy and retrieval latency
Cited by0BibtexViews0DOI
0
0
Our Feature Pyramid Transformer does not change the size of the feature pyramid, and is generic and easy to plug-and-play with modern deep networks
Cited by1BibtexViews235DOI
0
0
Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao,Chunjing Xu,Yixing Xu,Zhaohui Yang, Yiman Zhang
Transformer is becoming a hot topic in computer vision area due to its competitive performance and tremendous potential compared to convolutional neural networks
Cited by0BibtexViews495DOI
0
0
CVPR, pp.5790-5799, (2020)
The proposed texture transformer consists of a learnable texture extractor which learns a jointly feature embedding for further attention computation and two attention based modules which transfer HR textures from the Ref image
Cited by0BibtexViews44DOI
0
0
Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li
Since our framework treats depth estimation as an auxiliary for visual odometry without special optimization, the improvement indicates that accurate camera pose estimation improves depth estimation in the proposed framework
Cited by0BibtexViews0DOI
0
0
Transformer in Image Quality takes the advantage of inductive capability of convolution neural networks architecture for quality feature derivation and Transformer encoder for aggregated representation of attention mechanism
Cited by0BibtexViews2DOI
0
0
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao,Xiatian Zhu, Zekun Luo, Yabiao Wang,Yanwei Fu, Jianfeng Feng, Tao Xiang,Philip H. S. Torr,Li Zhang
We can see that our model SEgmentation TRansformer-PUP is superior to fully convolutional network baselines, and FCN plus attention based approaches, such as Non-local and CCNet; and its performance is on par with the best results reported so far
Cited by0BibtexViews2DOI
0
0
Peize Sun, Yi Jiang, Rufeng Zhang, Enze Xie, Jinkun Cao, Xinting Hu, Tao Kong,Zehuan Yuan,Changhu Wang,Ping Luo
The learned object query detects objects in the current frame and object feature query from the previous frame associates objects in the current frame with the previous ones
Cited by0BibtexViews3DOI
0
0
Sen Yang, Zhibin Quan, Mu Nie, Wankou Yang
TransPose models match the state-of-the-art on COCO Keypoint Detection task that has been dominated by deep fully convolutional architectures, and there seems to have further space to improve the upper limit of model performance by expanding the size of TransPose
Cited by0BibtexViews1DOI
0
0
Hugo Touvron,Matthieu Cord,Matthijs Douze, Francisco Massa, Alexandre Sablayrolles,Hervé Jégou
For Data-efficient image Transformers we have only optimized the existing data augmentation and regularization strategies pre-existing for convnets, not introducing any significant architectural beyond our novel distillation token
Cited by0BibtexViews2DOI
0
0
Xuran Pan, Zhuofan Xia,Shiji Song, Li Erran Li,Gao Huang
This paper introduces Pointformer, a highly effective feature learning backbone for 3D point clouds that is permutation invariant to points in the input and learns local and global context-aware representations
Cited by0BibtexViews16DOI
0
0
Josh Beal, Eric Kim,Eric Tzeng, Dong Huk Park,Andrew Zhai, Dmitry Kislyuk
We introduced Vision Transformer-Faster R-CNN, a competitive object detection solution which utilizes a transformer backbone, suggesting that there are sufficiently different architectures from the well-studied CNN backbone plausible to make progress on complex vision tasks
Cited by0BibtexViews4DOI
0
0
Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes local interactions. This makes them expressive, but also computa...
Cited by0BibtexViews0DOI
0
0
Hila Chefer,Shir Gur,Lior Wolf
Self-attention techniques, and specifically Transformers, are dominating the field of text processing and are becoming increasingly popular in computer vision classification tasks. In order to visualize the parts of the image that led to a certain classification, existing metho...
Cited by0BibtexViews2DOI
0
0
Xinpeng Wang, Chandan Yeshwanth, Matthias Nießner
Our model can serve as a general framework for scene generation: a different task can be solved by changing the set of object properties or conditioning inputs
Cited by0BibtexViews0DOI
0
0
We first show that our method outperforms the previous state-of-the-art human mesh reconstruction methods on Human3.6M and 3DPW datasets
Cited by0BibtexViews1DOI
0
0
Meng-Hao Guo,Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu,Ralph R. Martin,Shi-Min Hu
We propose a permutation-invariant point cloud transformer, which is suitable for learning on unstructured point clouds with irregular domain
Cited by0BibtexViews14DOI
0
0