DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style Word Generator
Abstract:
Audio Visual Scene-aware Dialog (AVSD) is the task of generating a response for a question with a given scene, video, audio, and the history of previous turns in the dialog. Existing systems for this task employ the transformers or recurrent neural network-based architecture with the encoder-decoder framework. Even though these techniqu...More
Code:
Data:
Full Text
Tags
Comments