Detecting near-duplicate document images using interest point matching

Pattern Recognition(2012)

引用 28|浏览25
暂无评分
摘要
We present an approach to detecting near-duplicate document images using SIFT interest point matching. Given a set of document images, a database is constructed from the SIFT features extracted from each image, stored as a kd-tree. The near-duplicates of a query image are estimated by directly matching its SIFT descriptors with the feature database. We demonstrate the approach on a challenging set of unconstrained Arabic hand and machine written images obtained from the field, consisting of 16,000+ documents. Our experiments indicate that the approach detects near-duplicates with low false alarm rate and outperforms bag-of-words based approach.
更多
查看译文
关键词
document image processing,feature extraction,image matching,natural language processing,tree data structures,SIFT descriptors,SIFT feature extraction,SIFT interest point matching,bag-of-words-based approach,false alarm rate,feature database,kd-tree storage,machine written images,near-duplicate document image detection,query image estimation,unconstrained Arabic hand
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要