A Large-Scale Chinese Short-Text Conversation Dataset
international conference natural language processing, pp. 91-103, 2020.
We present pre-training models for Chinese dialogue generation, which is trained on the 12M open-domain conversations
The advancements of neural dialogue generation models show promising results on modeling short-text conversations. However, training such models usually needs a large-scale high-quality dialogue corpus, which is hard to access. In this paper, we present a large-scale cleaned Chinese conversation dataset LCCC, which contains a base version...More
PPT (Upload PPT)