Information extraction from the web: techniques and applications

Information extraction from the web: techniques and applications(2007)

引用 27|浏览35
暂无评分
摘要
Web Information Extraction (WIE) systems have recently been able to extract massive quantities of relational data from online text. This has opened the possibility of achieving an elusive goal in Artificial Intelligence (AI): broad-coverage domain knowledge. AI systems depend to a great extent on having knowledge about the domains in which they operate, and such knowledge is typically expensive to enter into the system. Furthermore, the knowledge must be entered for every different domain in which an application is to operate. The Web contains knowledge about all kinds of different domains, but in a format that is not readily usable by AI systems. WIE promises to bridge the gap between the Web and AI. Natural Language Processing is an example of an area in AI in which knowledge can make a dramatic difference in the performance of an application. Understanding or interpreting language depends on the ability to understand the words used in a domain. The meanings, usages, and syntactic properties of words, and the relative frequency with which certain words are used, are necessary pieces of information for effective language processing, and much of this information can be extracted from text. In one case study, this thesis examines methods for using extracted information in improving a particular kind of language processing tool, a parser. Before information extraction can become broadly useful, however, more research must be done to improve the quality of the extracted information. A number of factors affect the quality, including correctness, importance or relevance, and the sophistication of meaning representation. The second case study in this thesis investigates a method for resolving synonyms in extracted information. This technique changes the meaning representation of extractions from one that relates words or names to one that relates entities to one another.
更多
查看译文
关键词
meaning representation,effective language processing,Web Information Extraction,broad-coverage domain knowledge,online text,AI system,language processing tool,case study,different domain,information extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要