Dialog Policy Learning for Joint Clarification and Active Learning Queries

Cited by: 0|Bibtex|Views319
Other Links: arxiv.org
Weibo:
We explore a middle-ground approach with a form of attribute-based clarification

Abstract:

Intelligent systems need to be able to recover from mistakes, resolve uncertainty, and adapt to novel concepts not seen during training. Dialog interaction can enable this by the use of clarifications for correction and resolving uncertainty, and active learning queries to learn new concepts encountered during operation. Prior work on d...More

Code:

Data:

0
Introduction
  • The ability to understand and communicate in natural language can improve the accessibility of systems such as robots, home devices and computers to non-expert users.
  • Since language is often be ambiguous, it is desirable for such systems to engage in a dialog with the user to clarify their intentions and obtain missing information.
  • The authors use clarification to refer to any dialog act that enables the system to better understand an ongoing user request.
  • The authors use the term active learning to refer to dialog acts used to obtain such knowledge with the primary purpose of improving the underlying language understanding model and thereby improving performance on future interactions
Highlights
  • The ability to understand and communicate in natural language can improve the accessibility of systems such as robots, home devices and computers to non-expert users
  • We use the term active learning to refer to dialog acts used to obtain such knowledge with the primary purpose of improving the underlying language understanding model and thereby improving performance on future interactions
  • We present a dialog task that combines natural language image retrieval with both Opportunistic Active Learning (OAL) and attribute-based clarification
  • We model each interaction as an episode in a Markov Decision Process (MDP) where the state consists of the images in the active training and test sets, the attributes mentioned in the target description, the current parameters of the classifier, and the set of queries asked and their responses
  • We demonstrate how a combination of reinforcement learning (RL) learned policies for choosing attribute-based clarification and active learning queries can be used to improve an interactive system that needs to retrieve images based on a natural language description, while encountering novel attributes at test time not seen during training
Methods
  • 4.1 Visual Attribute Classifier

    The authors train a multilabel classifier for predicting visual attributes given an image.
  • The authors extract features φ(i) for the images using the penultimate layer of an Inception-V3 network (Szegedy et al 2016) pretrained on ImageNet (Russakovsky et al 2015).
  • These are passed through two separate fully connected (FC) layers with ReLU activations, that are summed to produce the final representation f (i) used for classification.
Results
  • Results and Discussion

    The authors initialize the policy with 4 batches of dialogs, followed by 4 batches of dialogs for the training phase, and 5 batches of dialogs in the testing phase.
  • Table 1 shows the performance in the final test batch of the best fully learned policy, as well as a selected subset of the baselines.
  • The fully learned policy uses significantly shorter dialogs than all conditions with a static decision policy.
  • Some other conditions result in shorter dialogs, but these (a) Best Learned Policy (b) Baseline Static Policy Figure 3: Comparison of guess success rate with, and without clarifications across test batches
Conclusion
  • The authors demonstrate how a combination of RL learned policies for choosing attribute-based clarification and active learning queries can be used to improve an interactive system that needs to retrieve images based on a natural language description, while encountering novel attributes at test time not seen during training.
  • The authors further show that in this challenging setup, a combination of learned clarification and active learning policies is necessary to obtain improvement over directly performing retrieval without interaction
Summary
  • Introduction:

    The ability to understand and communicate in natural language can improve the accessibility of systems such as robots, home devices and computers to non-expert users.
  • Since language is often be ambiguous, it is desirable for such systems to engage in a dialog with the user to clarify their intentions and obtain missing information.
  • The authors use clarification to refer to any dialog act that enables the system to better understand an ongoing user request.
  • The authors use the term active learning to refer to dialog acts used to obtain such knowledge with the primary purpose of improving the underlying language understanding model and thereby improving performance on future interactions
  • Objectives:

    The authors' goal was to make an initial attempt at combining the functions of clarification and active learning in a single dialog task, and the authors considered binary questions for both functions to be a reasonable starting point.
  • The authors' goal was to choose relatively unambiguous clarification and active learning queries for the qualification task, and only allow users who correctly copied the description and provided the answers the authors expected to both queries to participate in the main experiment
  • Methods:

    4.1 Visual Attribute Classifier

    The authors train a multilabel classifier for predicting visual attributes given an image.
  • The authors extract features φ(i) for the images using the penultimate layer of an Inception-V3 network (Szegedy et al 2016) pretrained on ImageNet (Russakovsky et al 2015).
  • These are passed through two separate fully connected (FC) layers with ReLU activations, that are summed to produce the final representation f (i) used for classification.
  • Results:

    Results and Discussion

    The authors initialize the policy with 4 batches of dialogs, followed by 4 batches of dialogs for the training phase, and 5 batches of dialogs in the testing phase.
  • Table 1 shows the performance in the final test batch of the best fully learned policy, as well as a selected subset of the baselines.
  • The fully learned policy uses significantly shorter dialogs than all conditions with a static decision policy.
  • Some other conditions result in shorter dialogs, but these (a) Best Learned Policy (b) Baseline Static Policy Figure 3: Comparison of guess success rate with, and without clarifications across test batches
  • Conclusion:

    The authors demonstrate how a combination of RL learned policies for choosing attribute-based clarification and active learning queries can be used to improve an interactive system that needs to retrieve images based on a natural language description, while encountering novel attributes at test time not seen during training.
  • The authors further show that in this challenging setup, a combination of learned clarification and active learning policies is necessary to obtain improvement over directly performing retrieval without interaction
Tables
  • Table1: Results from the final batch of the test phase
  • Table2: Results of the static and new learned policies at the end of the test phase in simulation and in interactions on Amazon Mechanical Turk. Bold indicates a statistically significant improvement over the baseline (p < 0.05) and italic indicates trending significance (p <= 0.1) according to an unpaired Welch t-test
  • Table3: Unabridged results from the final batch of the test phase. ∗ indicates the conditions whose performance is comparable to the best condition (in bold)
  • Table4: A sample interaction from Amazon Mechanical Turk
Download tables as Excel
Related work
Funding
  • This work was supported by a Google Faculty Award received by Raymond J
  • Mooney and NSF NRI grants IIS1925082 and IIS-1637736
Study subjects and analysis
workers: 50
We required workers to have completed at least 1000 HITs and have at least a 95% approval rate on their previous HITs, as well as complete a qualification task to demonstrate that they understood the types of questions used in our experiment. We had 50 workers interact with each system tested. The results are shown in Table 2

Reference
  • Bachman, P.; Sordoni, A.; and Trischler, A. 2017. Learning Algorithms for Active Learning. In ICML, 301–310.
    Google ScholarLocate open access versionFindings
  • Baeza-Yates, R. 2016. Data and Algorithmic Bias in the Web. In ACM Conference on Web Science, 1–1.
    Google ScholarLocate open access versionFindings
  • Bhattacharya, I.; Chowdhury, A.; and Raykar, V. C. 2019. Multimodal Dialog for Browsing Large Visual Catalogs Using Exploration-Exploitation Paradigm in a Joint Embedding Space. In MMR, 187–191.
    Google ScholarLocate open access versionFindings
  • Bordes, A.; Boureau, Y.-L.; and Weston, J. 2017. Learning End-to-End Goal-Oriented Dialog. In ICLR.
    Google ScholarFindings
  • Budzianowski, P.; Ultes, S.; Su, P.-H.; Mrksic, N.; Wen, T.H.; Casanueva, I.; Barahona, L. M. R.; and Gasic, M. 2017. Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning. In SIGDIAL, 86–92.
    Google ScholarLocate open access versionFindings
  • Casanueva, I.; Budzianowski, P.; Su, P.-H.; Ultes, S.; RojasBarahona, L.; Tseng, B.-H.; and Gasic, M. 2018. Feudal Reinforcement Learning for Dialogue Management in Large Domains. In Proceedings of NAACL-HLT, 714–719.
    Google ScholarLocate open access versionFindings
  • Chaurasia, S.; and Mooney, R. J. 201Dialog for Language to Code. In IJCNLP, 175–180.
    Google ScholarLocate open access versionFindings
  • De Vries, H.; Strub, F.; Chandar, S.; Pietquin, O.; Larochelle, H.; and Courville, A. 2017. GuessWhat?! Visual Object Discovery Through Multi-modal Dialogue. In CVPR.
    Google ScholarFindings
  • Deits, R.; Tellex, S.; Thaker, P.; Simeonov, D.; Kollar, T.; and Roy, N. 2013. Clarifying Commands with Informationtheoretic Human-robot Dialog. Journal of Human-Robot Interaction 2(2): 58–79.
    Google ScholarLocate open access versionFindings
  • Dindo, H.; and Zambuto, D. 20A Probabilistic Approach to Learning a Visually Grounded Language Model Through Human-Robot Interaction. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 790–796. IEEE.
    Google ScholarLocate open access versionFindings
  • Dodge, J.; Gane, A.; Zhang, X.; Bordes, A.; Chopra, S.; Miller, A.; Szlam, A.; and Weston, J. 2016. Evaluating Prerequisite Qualities for Learning End-to-end Dialog Systems. In ICLR.
    Google ScholarFindings
  • Fang, M.; Li, Y.; and Cohn, T. 2017. Learning how to Active Learn: A Deep Reinforcement Learning Approach. In EMNLP.
    Google ScholarFindings
  • Farhadi, A.; Endres, I.; Hoiem, D.; and Forsyth, D. 2009. Describing Objects by their Attributes. In CVPR, 1778– 1785.
    Google ScholarLocate open access versionFindings
  • Guo, C.; Pleiss, G.; Sun, Y.; and Weinberger, K. Q. 2017. On Calibration of Modern Neural Networks. In ICML, 1321–1330.
    Google ScholarLocate open access versionFindings
  • Guo, S.; Huang, W.; Zhang, X.; Srikhanta, P.; Cui, Y.; Li, Y.; Adam, H.; Scott, M. R.; and Belongie, S. 2019. The iMaterialist Fashion Attribute Dataset. In ICCV Workshops, 0–0.
    Google ScholarLocate open access versionFindings
  • Guo, X.; Wu, H.; Cheng, Y.; Rennie, S.; Tesauro, G.; and Feris, R. 2018. Dialog-based Interactive Image Retrieval. In NeurIPS, 678–688.
    Google ScholarLocate open access versionFindings
  • Hu, H.; Wu, X.; Luo, B.; Tao, C.; Xu, C.; Wu, W.; and Chen, Z. 2018. Playing 20 Question Game with Policy-Based Reinforcement Learning. In EMNLP.
    Google ScholarFindings
  • Lackes, R.; Siepermann, M.; and Vetter, G. 2019. Can I Help You?–The Acceptance of Intelligent Personal Assistants. In International Conference on Business Informatics Research, 204–218.
    Google ScholarLocate open access versionFindings
  • Lee, S.-W.; Heo, Y.-J.; and Zhang, B.-T. 2018. Answerer in Questioner’s Mind for Goal-oriented Visual Dialogue. In Visually-Grounded Interaction and Language Workshop (NeurIPS).
    Google ScholarLocate open access versionFindings
  • Li, Y.; Huang, C.; Tang, X.; and Change Loy, C. 2017. Learning to Disambiguate by Asking Discriminative Questions. In ICCV, 3419–3428.
    Google ScholarFindings
  • Liptak, A. 2017. Amazon’s Alexa started ordering people dollhouses after hearing its name on TV. URL https://www.theverge.com/2017/1/7/14200210/amazonalexa-tech-news-anchor-order-dollhouse.
    Findings
  • Manuvinakurike, R.; DeVault, D.; and Georgila, K. 2017. Using reinforcement learning to model incrementality in a fast-paced dialogue game. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, 331–341.
    Google ScholarLocate open access versionFindings
  • Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 1928–1937.
    Google ScholarFindings
  • Nastar, C.; Mitschke, M.; and Meilhac, C. 1998. Efficient Query Refinement for Image Retrieval. In CVPR, 547–552.
    Google ScholarLocate open access versionFindings
  • Padmakumar, A.; Stone, P.; and Mooney, R. J. 2018. Learning a Policy for Opportunistic Active Learning. In EMNLP.
    Google ScholarFindings
  • Padmakumar, A.; Thomason, J.; and Mooney, R. J. 2017. Integrated Learning of Dialog Strategies and Semantic Parsing. In EACL, 547–557.
    Google ScholarLocate open access versionFindings
  • Parde, N.; Hair, A.; Papakostas, M.; Tsiakas, K.; Dagioglou, M.; Karkaletsis, V.; and Nielsen, R. D. 2015. Grounding the Meaning of Words Through Vision and Interactive Gameplay. In IJCAI.
    Google ScholarFindings
  • Peng, B.; Li, X.; Li, L.; Gao, J.; Celikyilmaz, A.; Lee, S.; and Wong, K.-F. 2017. Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning. In EMNLP, 2231–2240.
    Google ScholarFindings
  • Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; and Fei-Fei, L. 2015. ImageNet Large Scale Visual Recognition Challenge. IJCV 115(3): 211– 252. doi:10.1007/s11263-015-0816-y. URL https://doi.org/10.1007%2Fs11263-015-0816-y.
    Locate open access versionFindings
  • Saha, A.; Khapra, M. M.; and Sankaranarayanan, K. 2018. Towards Building Large scale Multimodal Domain-Aware Conversation Systems. In AAAI.
    Google ScholarFindings
  • Settles, B. 2010. Active Learning Literature Survey. University of Wisconsin, Madison 52(55-66): 11.
    Google ScholarLocate open access versionFindings
  • Strub, F.; De Vries, H.; Mary, J.; Piot, B.; Courvile, A.; and Pietquin, O. 2017. End-to-end Optimization of Goal-driven and Visually Grounded Dialogue Systems. In IJCAI, 2765– 2771.
    Google ScholarLocate open access versionFindings
  • Su, P.-H.; Gasic, M.; Mrksic, N.; Rojas-Barahona, L. M.; Ultes, S.; Vandyke, D.; Wen, T.-H.; and Young, S. J. 2016. On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems. In ACL (1).
    Google ScholarLocate open access versionFindings
  • Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; and Wojna, Z. 2016. Rethinking the Inception Architecture for Computer Vision. In CVPR, 2818–2826.
    Google ScholarLocate open access versionFindings
  • Thomason, J.; Padmakumar, A.; Sinapov, J.; Hart, J.; Stone, P.; and Mooney, R. J. 2017. Opportunistic Active Learning for Grounding Natural Language Descriptions. In CoRL, 67–76.
    Google ScholarLocate open access versionFindings
  • Thomason, J.; Zhang, S.; Mooney, R.; and Stone, P. 2015. Learning to Interpret Natural Language Commands through Human-Robot Dialog. In IJCAI, 1923–1929.
    Google ScholarFindings
  • Tieu, K.; and Viola, P. 2004. Boosting Image Retrieval. IJCV 56(1-2): 17–36.
    Google ScholarLocate open access versionFindings
  • Wen, T.-H.; Gasic, M.; Mrksic, N.; Rojas-Barahona, L. M.; Su, P.-H.; Ultes, S.; Vandyke, D.; and Young, S. 2016. A Network-based End-to-End Trainable Task-oriented Dialogue System. In NAACL.
    Google ScholarFindings
  • Williams, J.; Raux, A.; and Henderson, M. 2016. The Dialog State Tracking Challenge Series: A Review. Dialogue & Discourse 7(3): 4–33.
    Google ScholarLocate open access versionFindings
  • Woodward, M.; and Finn, C. 2017. Active one-shot learning. Computing Research Repository arXiv:1702.06559.
    Findings
  • Young, S.; Gasic, M.; Thomson, B.; and Williams, J. D. 2013. POMDP-based Statistical Spoken Dialog Systems: A Review. Proceedings of the IEEE 101(5): 1160–1179.
    Google ScholarLocate open access versionFindings
  • Yu, Y.; Eshghi, A.; and Lemon, O. 2017. Learning how to Learn: An Adaptive Dialogue Agent for Incrementally Learning Visually Grounded Word Meanings. In Workshop on Language Grounding for Robotics.
    Google ScholarFindings
  • Zhang, J.; Wu, Q.; Shen, C.; Zhang, J.; Lu, J.; and Van Den Hengel, A. 2018. Goal-oriented Visual Question Generation via Intermediate Rewards. In ECCV, 186–201.
    Google ScholarLocate open access versionFindings
  • Zhang, J.; Zhao, T.; and Yu, Z. 2018. Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog. In SIGDIAL, 140–150.
    Google ScholarLocate open access versionFindings
  • Zhu, Y.; Zhang, S.; and Metaxas, D. 2017. Interactive Reinforcement Learning for Object Grounding via Self-talking. Computing Research Repository arXiv:1712.00576.
    Findings
Full Text
Your rating :
0

 

Tags
Comments