AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We conducted a case study which indicates that a semi-automated approach can achieve categorization performance close to the manual, expert system approach of building text categorization systems

Feature selection, perceptron learning, and a usability case study for text categorization

Special Interest Group on Information Retrieval, no. SI (1997): 67-73

引用714|浏览172
EI
下载 PDF 全文
引用
微博一下

摘要

In this paper, we describe an automated learning approach to text categorization based on perceptron learning and a new feature selection metric, called correlation coefficient. Our approach has been tested on the standard Reuters text categorization collection. E...更多

代码

数据

0
简介
  • The phenomenal growth of the Internet has resulted in the availability of huge amounts of online information.
  • Much of this information is in the form of natural language texts.
  • A computer system that can categorize real-world, unrestricted English texts into a predeiined set of categories would be most useful.
  • When tested on the standard Reuters text categorization collection, the approach outperforms the best pubiished results on this Reuters corpus
重点内容
  • We live in a world of information explosion
  • We present an automated learning approach to building a robuste,fficient and practical text categorie tion system, called CLASSI, using tbe perception learning algorithm
  • We describe a new feature selection metric, called correlation coetlicient, which yields considerable improvement in categorization accuracy
  • Our evaluationhas shown that CLASSI outperforms existing appmdes onthestandard Reutera corpus
  • We conducted a case study which indicates that a semi-automated approach can achieve categorization performance close to the manual, expert system approach of building text categorization systems
结果
  • By manually modifying and augmenting the set of words to be used as featurea m a topic c8tegoriaer, the authors achieve accuracy very close totlmmanual rtde-based approach.
  • The authors achieved an F-measure accuracy of 0.522, which is still substantially lower than the accuracy of 0.733 achieved by TCS
结论
  • The authors have successfullybuilt a robust, efficient and practical text categorization system, CLASSI, using the perception learning algorithm.
  • The authors' evaluationhas shown that CLASSI outperforms existing appmdes onthestandard Reutera corpus.
  • The use of a new corrdation coefficient m feature selection results in considerable improvement in categon5 ation performance.
  • The authors conducted a case study which indicates that a semi-automated approach can achieve categorization performance close to the manual, expert system approach of building text categorization systems
表格
  • Table1: The perception learning algorithm
  • Table2: Effect of Feature Selection Method and Feature Set Size on Break-even point
  • Table3: Results on the Reuters test corpus
  • Table4: Successive improvements to CLASSIand Comparison with TCS
Download tables as Excel
引用论文
  • [Apte et af., 1994] Chidanand Apte, Red Damerau, and Sholom M. Weiss. Automated learning of decision rules for text categorization. ACM 2hanmctions on lnforrnotion S@em-s, 12(3):233-251,July 1994.
    Google ScholarLocate open access versionFindings
  • [Cohen and Sier, 1996] William w. Cohen and Yoram Singer. context-sensitive learning methods for text c-ategorization. In 19th International A CM SIGIR Conference on Reuearch and Development in hafomaation Retrieval, 1996.
    Google ScholarLocate open access versionFindings
  • [Hayes et af., 1990] P.J. Hayes, P.M. An&men, I.B. Nburg, and L.M. Schmandt. TCS: A shell for content-based text categorization. In Proceedings of the Sisth IEEE Conjerence on Artificial Intelligence Applications, pages 320326, 1990.
    Google ScholarLocate open access versionFindings
  • [Hearst et al., IW] Marti Hearst, Jan Pederaen, Peter Pirolli, Hinricb Schutze, Gregory Grefenstette, and David Hull. Xerox TREC4 site report. In Proceedings oj the Fourth Ted Retrieval Conference TREC-& 1996.
    Google ScholarLocate open access versionFindings
  • [Hull, 1994] David Hull. Improving text retrieval for the routing problem using latent semantic indexing. In z 7th International ACM SIGIR Conference on Reaeamh and Development in Jn\ormation Retrieval, 1994.
    Google ScholarLocate open access versionFindings
  • [Kohavi and John, 1995] Ron Kohavi and George H. John. Automatic parameter selection by minimiziw estimated error. In Machine Learning: Pmceedinga of the Twelfth lntemational Conjenmce, 1995.
    Google ScholarLocate open access versionFindings
  • [Lewis and Ringuette, 1994] David Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In SVmposium on Document AnalVsi# and Information Retrieval, 1994.
    Google ScholarLocate open access versionFindings
  • [Lewis et al., 1996] David D. Lewis, Robert E. !kha@e, James P. C&n, and Ron Papka. ‘lMning algorithms for linear text tilfiers. In 19th International ACM SIGIR Conference on Reseamh and Development in Information Retrieval, 1996.
    Google ScholarLocate open access versionFindings
  • [Lewis, 1992] David Lewis. Representation and Learning in Information RetrievaL PhD thesis, Dept of Computer and Information Science, Univ of Masaadmsetts at Amherst, 1992.
    Google ScholarFindings
  • [Masand et al., 1992] Brij Masand, Gordon Linoff, and David Waltz. Chsifying news stories using memory baaed reasoning. In 15th International ACM SIGIR Confermce on Remxwch and Development in Infomaation Retrieval, 1992,
    Google ScholarLocate open access versionFindings
  • [Miller, 1990] George A MMer. Five papers on WordNet. International Journal oj LexiwlogV, 3(4), 1990.
    Google ScholarLocate open access versionFindings
  • [Mooney et aL, IW] Raymond J. Mooney, Jude W. ShavIik, G. Towell, and A. Gove. An experiement.al comparison of symbolic and connectionist learning algorithms. In Pmceedinga of the Eleventh International Joint Confenmce on Ati”jfcial Intelligence, pages 775-780, 1989.
    Google ScholarLocate open access versionFindings
  • [Rijsbergen, 1979] C. J. Van Rijsbergen. Information Rettieval. Butterwortbs, London, 1979.
    Google ScholarLocate open access versionFindings
  • [kcchio, 1971] J. ROCChiO. Relevance feedback information retrieval. In Gerard Salton, editor, The Smart Retrieval S@em - Experiments in Automatic Document Processing, pages 313-323. Prentice-Hall, Engk wood Cliffs, NJ, 1971.
    Google ScholarLocate open access versionFindings
  • [Rosenblatt, 1958] F. Roeenblatt. The perception: A probabilistic model for information storage and organization in the brain. PsVchologiccd Review, 65:386-#8, 1958.
    Google ScholarLocate open access versionFindings
  • [Schutze et aL, 1995] Hinrich Schutze, David A. Hull, and Jan O. Pedemen. A comparison of classifiers and document representations for the routing problem. In 18th International ACM SIGIR Conference on Reseamh and Development in Information Retrieval, 1995.
    Google ScholarLocate open access versionFindings
  • [Wber et of., 1995] Erik Wkner, Jan O. Pedersen, and Andreas S. Weigend. A neural network approach to topic spotting. In Sympa~ium on Document Analyais and Informotion Retrieval, 1995.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn