Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition
ACL, pp. 625-638, 2020.
We propose Multi-Agent Dialog Policy Learning, where the user is regarded as another dialog agent rather than a user simulator
Many studies have applied reinforcement learning to train a dialog policy and show great promise these years. One common approach is to employ a user simulator to obtain a large number of simulated user experiences for reinforcement learning algorithms. However, modeling a realistic user simulator is challenging. A rule-based simulator ...More
PPT (Upload PPT)