AI帮你理解科学
AI 精读
AI抽取本论文的概要总结
微博一下:
Improving text retrieval for the routing problem using latent semantic indexing
SIGIR, pp.282-291, (1994)
EI
摘要
Latent Semantic Indexing (LSI) is a novel approach to information retrieval that attempts to model the underlying structure of term associations by transforming the traditional representation of documents as vectors of weighted term frequencies to a new coordinate space where both documents and terms are represented as linear combinations...更多
代码:
数据:
简介
- The vector space model (VSM) [1], which measures the similarity between the query and each document by the weighted inner product of overlapping terms, has long been a standard in information retrieval.
The VSM has its flaws, since it ignores both the order and association between terms, but it is hard to find a better method with an equivalent computational complexity. - The method reduces the full term-document matrix to a small number of information-rich LSI vectors, which can be used in a traditional retrieval model or as the basis for more advanced statistical classification algorithms.
- The goal is to find the relevant documents in a new collection or the remaining relevant documents in the collection that the sample is drawn from
- This task is equivalent to the routing problem used for system evaluation at the TREC retrieval conference [3].
- One can imagine this task as the second stage in a retrieval algorithm in place of the the traditional strategy of relevance feedback [4]
重点内容
- The vector space model (VSM) [1], which measures the similarity between the query and each document by the weighted inner product of overlapping terms, has long been a standard in information retrieval
- We address the issue of whether Latent Semantic Indexing improves performance when applied to the routing task
- We examine an alternative application of Latent Semantic Indexing that can be used in conjunction with statistical classification to obtain a significant improvement in retrieval performance
- The vector space model has long been used as a basic framework for developing new retrieval methods
结果
- If the retrieval strategy does not improve performance for the routing task, it will not produce good results for query-based information retrieval.
- The authors' experiments will provide evidence that LSI slightly improves performance for the routing task.
- The authors address the issue of whether LSI improves performance when applied to the routing task
结论
- The vector space model has long been used as a basic framework for developing new retrieval methods.
- It is difficult to devise a retrieval strategy that performs better with an equivalent amount of computation.
- The vector space model has some significant problems
- It assumes that terms are independent and ignores term associations.
- Latent Semantic Indexing addresses this problem by re-expressing the term-document matrix in a new coordinate system designed to capture the most significant components of the term association structure
引用论文
- Gerard Salton, editor. The SMART ing. Prentice-Hall, 1971.
- S. Deerwester, S. Dumais, G. Furna.s, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the Amertcan Soctety for Information Sczence, 41(6):391–407, 1990.
- Donna Harman. Overview ence, pages 36–47, 1993.
- Gerard Salton and Christopher Improving retrieval performance by relevance Journal of the Amertcan Society for Information
- Sczence, 41(4):288-297, 1990.
- 5. Yonggang Qiu and H.P. Frei. Concept Conference, pages 160-169, 1993.
- 6. Hinrich 1992. Dimensions of meaning. In Proceedings of Supercomputzng
- Conference, pages 107-115, 1993.
- 8. J. Friedman, J. Bentley, and R. Finkel. An algorithm for finding best matches in logarithmic time. ACM Transactions on Mathematical Soflware, 3(3):209-226, 1977.
- 9. G. Furnas, S. Deerwester, S. Dumais, T. Landauer, R. Harshman, Information retrieval using a singular value decomposition model Proc. of the llth ACM/SIGIR Conference, pages 465-480, 1988.
- 10. B.T. Bartell, G.W. Cottrell, and R.K. Belew. Latent semantic indexing is an optimal special case of multidimensional scaling. In Proc. of the 15th A CM/SIGIR Conference, pages 161–167, 1992. Processing and Management, Term-weighting 24(5):513-523, approaches 1988.
- 13. M. Berry. Large scale singular cations, 6(1):13–49, 1992. International Journal of Supercomputer
- 14. David Hull. Using statistical testing in the evaluation Conference, pages 329-338, 1993.
- 15. Donna Harman. 1-1o, 1993.
- 16. Geoffrey 341-346. J. McLachlan. Wiley, 1992.
- Conference, pages 202–210, 1991.
- Conference, pages 18-25, 1985.
标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn