Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data

Daniel Lopresti,Shourya Roy,Klaus Schulz,L. Venkata Subramaniam

Third Workshop on Analytics for Noisy Unstructured Text Data (AND)（2009）

引用 23|浏览8

暂无评分

摘要

Noise in text comes from two types of sources, broadly speaking : (i) text produced by processing signals intended for human use such as printed/handwritten documents, spontaneous speech, and camera-captured scene images and (ii) human generated natural language text such as electronic text from the Internet (emails, message boards, newsgroups, blogs, wikis, chat logs and web pages), contact centers (customer complaints, emails, call transcriptions, message summaries), and mobile phones (text messages). The pervasiveness of such noisy data is evident and the importance of analyzing such data is obvious and analyzing this requires moving beyond traditional text analytics techniques. The Third Workshop on Analytics for Noisy Unstructured Text Data (AND 2009) follows two successful previous editions: AND 2007 (in conjunction with the 20th Joint Conference on Artificial Intelligence [IJCAI]) and AND 2008 (in conjunction with the 31st Annual International ACM SIGIR Conference). AND 2007 was successful in creating awareness and emphasizing importance of noisy text analytics. It brought together industrial and academic researchers from various areas leading to high quality papers that were featured in a special issue of the International Journal on Document Analysis and Recognition (IJDAR). Researchers also shared several synthetic and real datasets from various domains for benefit of the community. AND 2008 built on the success of the first edition and brought together a larger community of people from different parts of the world with a specific focus on Information Retrieval as the application area. Another special issue of IJDAR is currently under production. Following this trend, the third edition of the AND workshop is being organized in conjunction with the Tenth International Conference on Document Analysis and Recognition (ICDAR'2009). Once again, we expect to see a range of thought-provoking discussions on methods for handling noise in text and related topics. The workshop Call for Papers had a good response, like previous editions. We received 22 submissions spanning a diverse set of issues relevant to noisy text analytics. Each submission was reviewed by at least three members of the program committee. In the workshop, there will be 15 contributed presentations, invited talks by Dr. Hildelies Balk (Programme Manager for the EU project IMPACT) and Dr. Venu Govindaraju (Distinguished Professor of CSE at SUNY, Buffalo), and several working group discussion sessions spread throughout the day. Through these opportunities for interaction, we hope AND 2009 will continue to foster the international research community as was the case with the first two AND workshops.

查看译文

关键词

Third Workshop,noisy text analytics,Annual International ACM SIGIR,traditional text analytics technique,electronic text,text message,special issue,Document Analysis,Noisy Unstructured Text Data,text analytics,natural language text,International Journal

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要