Think Beyond the Word: Understanding the Implied Textual Meaning by Digesting Context, Local, and Noise

Guoxiu He,Zhe Gao,Zhuoren Jiang,Yangyang Kang,Changlong Sun,Xiaozhong Liu,Wei Lu

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020（2020）

引用 4|浏览383

暂无评分

摘要

Implied semantics is a complex language act that can appear everywhere on the Cyberspace. The prevalence of implied spam texts, such as implied pornography, sarcasm, and abuse hidden within the novel, tweet, microblog, or review, can be extremely harmful to the physical and mental health of teenagers. The non-literal interpretation of the implied text is hard to be understood by machine models due to its high context-sensitivity and heavy usage of figurative language. In this study, inspired by human reading comprehension, we propose a novel, simple, and effective deep neural framework, called Skim and Intensive Reading Model (SIRM), for figuring out implied textual meaning. The proposed SIRM consists of three main components, namely the skim reading component, intensive reading component, and adversarial training component. N-gram features are quickly extracted from the skim reading component, which is a combination of several convolutional neural networks, as skim (entire) information. An intensive reading component enables a hierarchical investigation for both sentence-level and paragraph-level representation, which encapsulates the current (local) embedding and the contextual information (context) with a dense connection. More specifically, the contextual information includes the near-neighbor information and the skim information mentioned above. Finally, besides the common training loss function, we employ an adversarial loss function as a penalty over the skim reading component to eliminate noisy information (noise) arisen from special figurative words in the training data. To verify the effectiveness, robustness, and efficiency of the proposed architecture, we conduct extensive comparative experiments on an industrial novel dataset involving implied pornography and three sarcasm benchmarks. Experimental results indicate that (1) the proposed model, which benefits from context and local modeling and consideration of figurative language (noise), outperforms existing state-of-the-art solutions, with comparable parameter scale and running speed; (2) the SIRM yields superior robustness in terms of parameter size sensitivity; (3) compared with ablation and addition variants of the SIRM, the final framework is efficient enough.

查看译文

关键词

implied textual meaning, semantic representation, text classification, deep neural networks

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要