Interactive Fiction Game Playing as Multi Paragraph Reading Comprehension with Reinforcement Learning

EMNLP 2020, pp. 7755-7765, 2020.

Other Links: arxiv.org|academic.microsoft.com
Weibo:
We formulate the general Interactive Fiction game playing as Multi-Passage Reading Comprehension tasks, enabling an MPRC-style solution to efficiently address the key IF game challenges on the huge combinatorial action space and the partial observability in a unified framework

Abstract:

Interactive Fiction (IF) games with real human-written natural language texts provide a new natural evaluation for language understanding techniques. In contrast to previous text games with mostly synthetic texts, IF games pose language understanding challenges on the human-written textual descriptions of diverse and sophisticated game wo...More

Code:

Data:

0
Introduction
  • Interactive systems capable of understanding natural language and responding in the form of natural language text have high potentials in various applications.
  • In pursuit of building and evaluating such systems, the authors study learning agents for Interactive Fiction (IF) games.
  • IF gameplay agents need to simultaneously understand the game’s information from a text display and generate.
  • Natural language command via a text input interface.
  • Without providing an explicit game strategy, the agents need to identify behaviors that maximize objective-encoded cumulative rewards.
  • IF games composed of human-written texts create superb new opportunities for studying and evaluating natural language understanding (NLU) techniques due to their unique characteristics.
  • The resulted texts in IF games are more linguistically diverse and sophisticated than the template-generated ones in synthetic text games. (2) The language contexts of IF games
Highlights
  • Interactive systems capable of understanding natural language and responding in the form of natural language text have high potentials in various applications
  • In pursuit of building and evaluating such systems, we study learning agents for Interactive Fiction (IF) games
  • We propose a novel formulation of IF game playing as Multi-Passage Reading Comprehension (MPRC) and harness MPRC techniques to solve the huge action space and partial observability challenges
  • We summarize the performance of our MultiParagraph Reading Comprehension Deep Q-Network (DQN) (MPRCDQN) agent and baselines in Table 2
  • We formulate the general IF game playing as MPRC tasks, enabling an MPRC-style solution to efficiently address the key IF game challenges on the huge combinatorial action space and the partial observability in a unified framework
Methods
  • The authors evaluate the proposed methods on the suite of Jericho supported games. The authors compared to all previous baselines that include recent methods addressing the huge action space and partial observability challenges.

    4.1 Setup Jericho Handicaps and Configuration.
  • The authors evaluate the proposed methods on the suite of Jericho supported games.
  • The authors compared to all previous baselines that include recent methods addressing the huge action space and partial observability challenges.
  • Previous work included the last action or game score as additional inputs.
  • The authors' model discarded these two types of inputs as the authors did not observe a significant difference by the model.
  • The maximum game step number is set to 100 following baselines
Results
  • The authors' approaches achieved or outperformed the stateof-the-art performance on 25 out of 33 games, trained with less than one-tenth of game interaction data used by prior art.
Conclusion
  • The authors formulate the general IF game playing as MPRC tasks, enabling an MPRC-style solution to efficiently address the key IF game challenges on the huge combinatorial action space and the partial observability in a unified framework.
  • The authors' approaches achieved significant improvement over the previous state-of-the-art on both game scores and training data efficiency.
  • The authors' formulation bridges broader NLU/RC techniques to address other critical challenges in IF games for future work, e.g., common-sense reasoning, noveltydriven exploration, and multi-hop inference
Summary
  • Introduction:

    Interactive systems capable of understanding natural language and responding in the form of natural language text have high potentials in various applications.
  • In pursuit of building and evaluating such systems, the authors study learning agents for Interactive Fiction (IF) games.
  • IF gameplay agents need to simultaneously understand the game’s information from a text display and generate.
  • Natural language command via a text input interface.
  • Without providing an explicit game strategy, the agents need to identify behaviors that maximize objective-encoded cumulative rewards.
  • IF games composed of human-written texts create superb new opportunities for studying and evaluating natural language understanding (NLU) techniques due to their unique characteristics.
  • The resulted texts in IF games are more linguistically diverse and sophisticated than the template-generated ones in synthetic text games. (2) The language contexts of IF games
  • Methods:

    The authors evaluate the proposed methods on the suite of Jericho supported games. The authors compared to all previous baselines that include recent methods addressing the huge action space and partial observability challenges.

    4.1 Setup Jericho Handicaps and Configuration.
  • The authors evaluate the proposed methods on the suite of Jericho supported games.
  • The authors compared to all previous baselines that include recent methods addressing the huge action space and partial observability challenges.
  • Previous work included the last action or game score as additional inputs.
  • The authors' model discarded these two types of inputs as the authors did not observe a significant difference by the model.
  • The maximum game step number is set to 100 following baselines
  • Results:

    The authors' approaches achieved or outperformed the stateof-the-art performance on 25 out of 33 games, trained with less than one-tenth of game interaction data used by prior art.
  • Conclusion:

    The authors formulate the general IF game playing as MPRC tasks, enabling an MPRC-style solution to efficiently address the key IF game challenges on the huge combinatorial action space and the partial observability in a unified framework.
  • The authors' approaches achieved significant improvement over the previous state-of-the-art on both game scores and training data efficiency.
  • The authors' formulation bridges broader NLU/RC techniques to address other critical challenges in IF games for future work, e.g., common-sense reasoning, noveltydriven exploration, and multi-hop inference
Tables
  • Table1: Summary of the main technical differences between our agent and the baselines. All agents use DQN to update the model parameters except KG-A2C uses A2C. All agents use the same handicaps
  • Table2: Average game scores on Jericho benchmark games. The best performing agent score per game is in bold. The Winning percentage / counts row computes the percentage / counts of games that the corresponding agent is best. The scores of baselines are from their papers. The missing scores are represented as “–”, for which games KG-A2C skipped. We also added the 100-step results from a human-written game-playing walkthrough, as a reference of human-level scores. We denote the difficulty levels of the games defined in the original Jericho paper with colors in their names – possible (i.e., easy or normal) games in green color, difficult games in tan and extreme games in red. Best seen in color. a Zork3 walkthrough does not maximize the score in the first 100 steps but explores more. b Our agent discovers some unbounded reward loops in the game Ztuu
  • Table3: Difficulty levels and characteristics of games on which our approach achieves the most considerable improvement. Dialog indicates that it is necessary to speak with another character. Darkness indicates that accessing some dark areas requires a light source. Nonstandard Actions refers to actions with words not in an English dictionary. Inventory Limit restricts the number of items carried by the player. Please refer to (<a class="ref-link" id="cHausknecht_et+al_2019_a" href="#rHausknecht_et+al_2019_a">Hausknecht et al, 2019a</a>) for more comprehensive definitions
  • Table4: Pairwise comparison between our MPRC-DQN versus each baseline
Download tables as Excel
Related work
  • IF Game Agents. Previous work mainly studies the text understanding and generation in parserbased or rule-based text game tasks, such as TextWorld platform (Coteet al., 2018) or custom domains (Narasimhan et al, 2015; He et al, 2016; Adhikari et al, 2020). The recent platform Jericho (Hausknecht et al, 2019a) supports over thirty human-written IF games. Earlier successes in real IF games mainly rely on heuristics without learning. NAIL (Hausknecht et al, 2019b) is the state-of-theart among these “no-learning” agents, employing a series of reliable heuristics for exploring the game, interacting with objects, and building an internal representation of the game world. With the development of learning environments like Jericho, the RL-based agents have started to achieve dominating performance.
Funding
  • Our approaches achieved or outperformed the stateof-the-art performance on 25 out of 33 games, trained with less than one-tenth of game interaction data used by prior art
Reference
  • Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Cote, Mikulas Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, and William L Hamilton. 2020. Learning dynamic knowledge graphs to generalize on textbased games. arXiv preprint arXiv:2002.09127.
    Findings
  • Prithviraj Ammanabrolu and Matthew Hausknecht. 2020. Graph constrained reinforcement learning for natural language action spaces. arXiv, pages arXiv– 2001.
    Google ScholarFindings
  • Prithviraj Ammanabrolu and Mark Riedl. 2019. Playing text-adventure games with graph-based deep reinforcement learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3557–3565.
    Google ScholarLocate open access versionFindings
  • Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, and Caiming Xiong. 2019. Learning to retrieve reasoning paths over wikipedia graph for question answering. arXiv preprint arXiv:1911.10470.
    Findings
  • Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450.
    Findings
  • Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading wikipedia to answer opendomain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870– 1879.
    Google ScholarLocate open access versionFindings
  • Ameya Godbole, Dilip Kavarthapu, Rajarshi Das, Zhiyu Gong, Abhishek Singhal, Hamed Zamani, Mo Yu, Tian Gao, Xiaoxiao Guo, Manzil Zaheer, et al. 2019. Multi-step entity-centric information retrieval for multi-hop question answering. arXiv preprint arXiv:1909.07598.
    Findings
  • Matthew Hausknecht, Prithviraj Ammanabrolu, MarcAlexandre Cote, and Xingdi Yuan. 2019a. Interactive fiction games: A colossal adventure. arXiv preprint arXiv:1909.05398.
    Findings
  • Matthew Hausknecht, Ricky Loynd, Greg Yang, Adith Swaminathan, and Jason D Williams. 2019b. Nail: A general interactive fiction agent. arXiv preprint arXiv:1902.04259.
    Findings
  • Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, and Mari Ostendorf. 2016. Deep reinforcement learning with a natural language action space. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1621–1630.
    Google ScholarLocate open access versionFindings
  • Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. 2017. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611.
    Google ScholarLocate open access versionFindings
  • Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al. 2019. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart Van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
    Findings
  • Marc-Alexandre Cote, Akos Kadar, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, et al. 2018. Textworld: A learning environment for text-based games. In Workshop on Computer Games, pages 41–75. Springer.
    Google ScholarLocate open access versionFindings
  • Yankai Lin, Haozhe Ji, Zhiyuan Liu, and Maosong Sun. 2018. Denoising distantly supervised open-domain question answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1736– 1745.
    Google ScholarLocate open access versionFindings
  • Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, page 3.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Sewon Min, Danqi Chen, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2019a. A discrete hard em approach for weakly supervised question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2844– 2857.
    Google ScholarLocate open access versionFindings
  • Ming Ding, Chang Zhou, Qibin Chen, Hongxia Yang, and Jie Tang. 20Cognitive graph for multi-hop reading comprehension at scale. In Proceedings of ACL 2019.
    Google ScholarLocate open access versionFindings
  • Sewon Min, Danqi Chen, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2019b. Knowledge guided text retrieval and reading for open domain question answering. arXiv preprint arXiv:1911.03868.
    Findings
  • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533.
    Google ScholarLocate open access versionFindings
  • Xiangyang Mou, Mo Yu, Bingsheng Yao, Chenghao Yang, Xiaoxiao Guo, Saloni Potdar, and Hui Su. 2020. Frustratingly hard evidence retrieval for qa over books. In Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events, pages 108–113.
    Google ScholarLocate open access versionFindings
  • Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. 2018. Qanet: Combining local convolution with global self-attention for reading comprehension. arXiv preprint arXiv:1804.09541.
    Findings
  • Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J Mankowitz, and Shie Mannor. 2018. Learn what not to learn: Action elimination with deep reinforcement learning. In Advances in Neural Information Processing Systems, pages 3562–3573.
    Google ScholarLocate open access versionFindings
  • Karthik Narasimhan, Tejas Kulkarni, and Regina Barzilay. 2015. Language understanding for text-based games using deep reinforcement learning. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1–11.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of NAACL-HLT, pages 2227–2237.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners.
    Google ScholarFindings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.
    Google ScholarLocate open access versionFindings
  • Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional attention flow for machine comprehension.
    Google ScholarFindings
  • Shuohang Wang and Jing Jiang. 2016. Machine comprehension using match-lstm and answer pointer. arXiv preprint arXiv:1608.07905.
    Findings
  • Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerry Tesauro, Bowen Zhou, and Jing Jiang. 2018. R 3: Reinforced ranker-reader for open-domain question answering. In Thirty-Second AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, and Murray Campbell. 2017. Evidence aggregation for answer re-ranking in open-domain question answering. arXiv preprint arXiv:1711.05116.
    Findings
Full Text
Your rating :
0

 

Tags
Comments