Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence
arxiv(2024)
摘要
This paper introduces a Transformer-based integrative feature and cost
aggregation network designed for dense matching tasks. In the context of dense
matching, many works benefit from one of two forms of aggregation: feature
aggregation, which pertains to the alignment of similar features, or cost
aggregation, a procedure aimed at instilling coherence in the flow estimates
across neighboring pixels. In this work, we first show that feature aggregation
and cost aggregation exhibit distinct characteristics and reveal the potential
for substantial benefits stemming from the judicious use of both aggregation
processes. We then introduce a simple yet effective architecture that harnesses
self- and cross-attention mechanisms to show that our approach unifies feature
aggregation and cost aggregation and effectively harnesses the strengths of
both techniques. Within the proposed attention layers, the features and cost
volume both complement each other, and the attention layers are interleaved
through a coarse-to-fine design to further promote accurate correspondence
estimation. Finally at inference, our network produces multi-scale predictions,
computes their confidence scores, and selects the most confident flow for final
prediction. Our framework is evaluated on standard benchmarks for semantic
matching, and also applied to geometric matching, where we show that our
approach achieves significant improvements compared to existing methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要