AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
This chapter introduces the problem of domain dependence of natural language processing systems in a general machine learning setting

Domain adaptation for parsing

(2011)

被引用20|浏览3
下载 PDF 全文
引用
微博一下

摘要

Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright ple...更多

代码

数据

简介
  • This chapter introduces the problem of domain dependence of natural language processing systems in a general machine learning setting.
  • The authors provide an overview of techniques for domain adaptation that address this problem, introduces straightforward baselines and discusses prior work on domain adaptation with a special focus on natural language parsing.
  • The ultimate goal of natural language processing (NLP) is to build systems that are able to understand and/or produce natural language, just as the authors humans do
  • This is an intrinsically difficult task due to the ambiguity of natural language pertaining to all linguistic levels.
重点内容
  • This chapter introduces the problem of domain dependence of natural language processing systems in a general machine learning setting
  • To create systems that perform well on these tasks, supervised machine learning (ML) algorithms are employed to learn a model capable of performing the task at hand on the basis of annotated training data – the general machine learning setup is illustrated in Figure 3.1
  • The Charniak parser operates on an accuracy level of around 90% when tested on data from the same domain as the training data (WSJ)
结果
  • The Charniak parser operates on an accuracy level of around 90% when tested on data from the same domain as the training data (WSJ).
  • The accuracy drops by about 6% when the model is applied to the more varied Brown corpus
结论
  • Summary and Outlook

    The goal of this chapter was to introduce the problem of domain dependence and provide an overview of approaches to tackle this problem, commonly known as domain adaptation techniques.
  • A domain is defined by the corpus given
  • This implies that most previous work on domain adaptation focused on adapting a system trained on one specific domain to a particular other domain.
  • In this setting, one knew that a domain change had occurred, knew what the source and what the target domain was and, had data available for that domain to exploit.
  • The authors will examine the adaptation of the syntactic disambiguation component of a grammar-driven parser in the two chapters, while the last chapter focuses on domain adaptation for a data-driven dependency parser
总结
  • Introduction:

    This chapter introduces the problem of domain dependence of natural language processing systems in a general machine learning setting.
  • The authors provide an overview of techniques for domain adaptation that address this problem, introduces straightforward baselines and discusses prior work on domain adaptation with a special focus on natural language parsing.
  • The ultimate goal of natural language processing (NLP) is to build systems that are able to understand and/or produce natural language, just as the authors humans do
  • This is an intrinsically difficult task due to the ambiguity of natural language pertaining to all linguistic levels.
  • Results:

    The Charniak parser operates on an accuracy level of around 90% when tested on data from the same domain as the training data (WSJ).
  • The accuracy drops by about 6% when the model is applied to the more varied Brown corpus
  • Conclusion:

    Summary and Outlook

    The goal of this chapter was to introduce the problem of domain dependence and provide an overview of approaches to tackle this problem, commonly known as domain adaptation techniques.
  • A domain is defined by the corpus given
  • This implies that most previous work on domain adaptation focused on adapting a system trained on one specific domain to a particular other domain.
  • In this setting, one knew that a domain change had occurred, knew what the source and what the target domain was and, had data available for that domain to exploit.
  • The authors will examine the adaptation of the syntactic disambiguation component of a grammar-driven parser in the two chapters, while the last chapter focuses on domain adaptation for a data-driven dependency parser
表格
  • Table1: F-scores of the Charniak PCFG parser trained on the Penn Treebank WSJ and evaluated on different domains (as reported in McClosky, 2010; p.44)
  • Table2: Parsing results from Gildea (2001) by training and test corpus. Size of training sets, respectively: WSJ 39,832 sentences; Brown 21,818 sentences
  • Table3: Results reported in Hara et al (2005), where the general model is exploited as reference distribution for the target domain. The size of the Genia and WSJ corpus is, respectively: 3,524 and 39,832 sentences
Download tables as Excel
基金
  • The Charniak parser operates on an accuracy level of around 90% when tested on data from the same domain as the training data (WSJ)
  • The accuracy drops by about 6% (in absolute bracketing f-score) when the model is applied to the more varied Brown corpus (that contains fiction/non-fiction literature)
引用论文
  • 2. Unsupervised domain adaptation (e.g. Blitzer, McDonald & Pereira, 2006; McClosky et al., 2006)
    Google ScholarLocate open access versionFindings
  • 3. Semi-supervised domain adaptation (e.g. Daume III, Kumar & Saha, 2010; Chang, Connor & Roth, 2010)
    Google ScholarLocate open access versionFindings
作者
barbara plank
barbara plank
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科