Detecting near-duplicate document images using interest point matching

Shiv Naga Prasad Vitaladevuni,Fred Choi,Rohit Prasad,Premkumar Natarajan

Pattern Recognition（2012）

引用 28|浏览25

暂无评分

摘要

We present an approach to detecting near-duplicate document images using SIFT interest point matching. Given a set of document images, a database is constructed from the SIFT features extracted from each image, stored as a kd-tree. The near-duplicates of a query image are estimated by directly matching its SIFT descriptors with the feature database. We demonstrate the approach on a challenging set of unconstrained Arabic hand and machine written images obtained from the field, consisting of 16,000+ documents. Our experiments indicate that the approach detects near-duplicates with low false alarm rate and outperforms bag-of-words based approach.

查看译文

关键词

document image processing,feature extraction,image matching,natural language processing,tree data structures,SIFT descriptors,SIFT feature extraction,SIFT interest point matching,bag-of-words-based approach,false alarm rate,feature database,kd-tree storage,machine written images,near-duplicate document image detection,query image estimation,unconstrained Arabic hand

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要