
DLA-Net for FG-SBIR: Dynamic Local Aligned Network for Fine-Grained Sketch-Based Image Retrieval

International Multimedia Conference(2021)

引用 13|浏览12
ABSTRACTFine-grained sketch-based image retrieval is considered as an ideal alternative to keyword-based image retrieval and image search by image due to the rich and easily accessible characteristics of sketches. Previous works always follow a paradigm that first extracting image global feature with convolution neural network and then optimizing the model with triplet loss. Many efforts on narrowing the domain gap and extracting discriminating features are made by these works. However, they ignored that the global feature is not good at capturing fine-grained details. In this paper, we emphasize the local features are more discriminating than global feature in FG-SBIR and explore an effective way to utilize local features. Specifically, Local Aligned Network (LA-Net) is proposed first, which solves FG-SBIR by directly aligning the mid-level local features. Experiment manifests it can beat all previous baselines and is easy to implement. LA-Net is hoped to be a new strong baseline for FG-SBIR. Next, Dynamic Local Aligned Network (DLA-Net) is proposed to enhance LA-Net. The question of spatial misalignment caused by the abstraction of the sketch is not considered by LA-Net. To solve this question, a dynamic alignment mechanism is introduced into LA-Net. This new mechanism makes the sketch interact with the photo and dynamically decide where to align according to the different photos. The Experiment indicates DLA-Net successfully addresses the question of spatial misalignment. It gains a significant performance boost over LA-Net and outperforms the state-of-the-art in FG-SBIR. To the best of our knowledge, DLA-Net is the first model that beats humans on all datasets---QMUL FG-SBIR, QMUL Handbag, and Sketchy.
AI 理解论文
Chat Paper