Report on the TREC 2005 Experiment: Genomics Track

Text REtrieval Conference(2007)

引用 42|浏览14
Summary Because of corruptions in the XML TREC Genomics collec- tion, which were detected only some days before the submis- sion deadline, we were not able to submit runs for the ad hoc retrieval task (task I), although relevance judgements made after polling were used to evaluate our approaches, and there- fore this report mostly focuses on the text categorization task (task II: triage and annotation). Task I. Our approach uses thesaural resources (from the UMLS) together with a variant of the Porter stemmer for string normalization. Gene and Protein Entities (GPE) of the collec- tion were simply marked up by dictionary look up during the indexing in order to avoid erroneous conflation: strings not found in the UMLS Specialist lexicon (augmented with vari- ous English lexical resources) were considered as GPE and were moderately overweighed. Two different weighting schemas were tested: first, a standard tf-idf with cosine nor- malization, second a weighting based on the deviation from randomness model. For indexing the Genomic collection, the following MEDLINE records were selected: article's titles, MeSH and RN terms, and abstract fields. We investigated the use of high-precisions strategies and our system returned only highly reliable documents so that some queries were not answered by the system. Our best results achieved an average precision of 60%. The score was obtained using UMLS re- sources and GPE (Gene and Protein Entity) tagging together with a combination of a classical atc.ltn schema (following SMART notation) with a deviation from randomness (8) weighting.
AI 理解论文
Chat Paper