SimPLR: A Simple and Plain Transformer for Scaling-Efficient Object Detection and Segmentation
arxiv(2023)
摘要
The ability to detect objects in images at varying scales has played a
pivotal role in the design of modern object detectors. Despite considerable
progress in removing hand-crafted components and simplifying the architecture
with transformers, multi-scale feature maps and/or pyramid design remain a key
factor for their empirical success. In this paper, we show that this reliance
on either feature pyramids or an hierarchical backbone is unnecessary and a
transformer-based detector with scale-aware attention enables the plain
detector `SimPLR' whose backbone and detection head are both non-hierarchical
and operate on single-scale features. We find through our experiments that
SimPLR with scale-aware attention is plain and simple, yet competitive with
multi-scale vision transformer alternatives. Compared to the multi-scale and
single-scale state-of-the-art, our model scales much better with bigger
capacity (self-supervised) models and more pre-training data, allowing us to
report a consistently better accuracy and faster runtime for object detection,
instance segmentation as well as panoptic segmentation. Code will be released.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要