ContextMatcher: Detector-Free Feature Matching with Cross-Modality Context

IEEE Transactions on Circuits and Systems for Video Technology（2024）

引用 0|浏览0

暂无评分

摘要

Existing feature matching methods tend to extract feature descriptors by relying on the visual appearance, leading to false matches which are obviously false from the geometric perspective. This paper proposes ContextMatcher, which goes beyond the visual appearance representation by introducing the geometric context to guild the feature matching. Specifically, our ContextMatcher includes visual descriptors generation, the neighborhood consensus module, and the geometric context encoder. To learn visual descriptors, Transformers situated in different branches are leveraged to obtain feature descriptors. In one branch, convolutions are integrated into self-attention layers elegantly to compensate for the lack of the local structure information. In another branch, a cross-scale Transformer is proposed through injecting heterogeneous receptive field sizes into tokens. To leverage and aggregate the geometric contextual information, a neighborhood consensus mechanism is proposed by re-ranking initial pixel-level matches to make a constraint of geometric consensus on neighborhood feature descriptors. Moreover, local feature descriptors are boosted through combining with the geometric properties of keypoints for refining matches to the sub-pixel level. Extensive experiments on relative pose estimations and image matching show that our proposed method outperforms existing state-of-the-art methods by a large margin.

查看译文

关键词

Local feature matching,Transformer,Feature extraction,Feature representation,Convolutional neural network,Neighborhood consensus

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要