Efficient Filtration of Sequence Similarity Search Through Singular Value Decomposition

SA Aghili,OD Sahin,D Agrawal,A El Abbadi

Proceedings Fourth IEEE Symposium on Bioinformatics and Bioengineering

引用 7|浏览12

暂无评分

摘要

Similarity search in textual databases and bioinformatics has received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of whole- genome sequence similarity search into an approximate vector comparison in the well-established multidimensional vector space. We propose the application of the singular value decomposition (SVD) dimensionality reduction technique as a pre-processing filtration step to effectively reduce the search space and the running time of the search operation. Our empirical results on a prokaryote and a eukaryote DNA contig dataset, demonstrate effective filtration to prune non-relevant portions of the database with up to 2.3 times faster running time compared with q-gram approach. SVD filtration may easily be integrated as a pre-processing step for any of the well-known sequence search heuristics as BLAST, QUASAR and FastA. We analyze the precision of applying SVD filtration as a transformation-based dimensionality reduction technique, and finally discuss the imposed trade-offs.

查看译文

关键词

Approximate String Search,Sequence Homology,Singular Value Decomposition,bioinformatics,comparative genomics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要