Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground(2010)

引用 24|浏览16
暂无评分
摘要
Since early 1990s, with the advancement of machine learning methods and the availability of data resources such as treebanks and parallel corpora, data-driven approaches to NLP have made significant progress. The success of such data-driven approaches has cast doubt on the relevance of linguistics to NLP. Conversely, NLP techniques are rarely used to help linguistics studies. We believe that there is room to expand the involvement of linguistics in NLP, and likewise, NLP in linguistics, and that the cross-pollination of ideas between the disciplines can greatly benefit both fields. We are pleased to present the workshop on NLP and Linguistics: Finding the Common Ground in order to focus on some of the work that uses NLP and linguistics for mutual benefit, and discuss future plans for continuing collaborations. The workshop is intended to spur discussion on how NLP and linguistics can help each other, including new methods in incorporating linguistic knowledge into statistical systems to advance the state of the art of NLP, and the feasibility of using NLP techniques to acquire linguistic knowledge for a large number of languages and to assist linguistic studies. Fifteen papers were submitted and nine were accepted (one later withdrew), and the accepted papers are oriented around the following themes: • Research that shows awareness of a particular linguistic phenomenon and its effects on statistical systems: Caines and Buttery discuss the zero auxiliary construction (You talking to me?), awareness of which can improve performance of NLP on spoken English. Samaradžić and Merlo suggest that awareness of different types of light verb constructions could affect word alignment. Su, Huang, and Chen show that the linguistic notion of evidentiality can be used for automatic detection of trustworthiness. • New methods in incorporating linguistic knowledge into statistical systems to improve the start of the art: The papers by Caines and Buttery, Cook and Stevenson, Samaradžić and Merlo, and Su, Huang, and Chen all present a number of linguistic features that can be used for modeling or other corpus-based tasks. • Research that demonstrates the feasibility of creating NLP systems to automatically acquire linguistic knowledge for a large number of languages: Mayer, Rohrdantz, Plank, Bak, Butt, and Keim examine a phonotactic constraint in 3,200 languages. Poornima and Good propose the repurposing of traditional word lists from historical and comparative linguistics to NLP applications. • Research that demonstrates the benefits of using NLP techniques to help particular linguistic studies: This volume is rich with examples of corpus-based techniques shedding light on linguistic phenomena, including the ambiguity of German past participles (Zarrieß, Cahill, Kuhn, and Rohrer), zero auxiliary constructions (Caines and Buttery), light verbs (Samaradžić and Merlo), a paradoxical reading of "no X is too Y to Z" (Cook and Stevenson), the phonotactic constraint of Similar Place Avoidance (Mayer, Rohrdantz, Plank, Bak, Butt, and Keim), and evidentiality (Su, Huang, and Chen). • The relative strengths and weaknesses of corpus-based and rule-based resources: Plank and van Noord examine the domain portability of rule-based and corpus-trained parsers. Zarrieß, Cahill, Kunh, and Rohrer show that a corpus-based analysis can help reduce ambiguity of German past participles in a rule-based parser.
更多
查看译文
关键词
phonotactic constraint,nlp technique,statistical system,german past participle,large number,common ground,data-driven approach,new method,linguistic knowledge,nlp application,nlp system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要