Unsupervised Embedding for Latent Similarity by Modeling Heterogeneous MOOC Data.

PAKDD(2017)

引用 24|浏览80
暂无评分
摘要
Recent years have witnessed the prosperity of Massive Open Online Courses (MOOCs). One important characteristic of MOOCs is that video clips and discussion forum are integrated into a one-stop learning setting. However, discussion forums have been in disorder and chaos due to ‘Massive’ and lack of efficient management. A technical solution is to associate MOOC forum threads to corresponding video clips, which can be regarded as a problem of representation learning. Traditional textual representation, e.g. Bag-of-words (BOW), do not consider the latent semantics, while recent semantic word embeddings, e.g. Word2vec, do not capture the similarity between documents, i.e. latent similarity. So learning distinguishable textual representation is the key to resolve the problem. In this paper, we propose an effective approach called No-label Sequence Embedding (NOSE) which can capture not only the latent semantics within words and documents, but also the latent similarity. We model multiform MOOC data in a heterogeneous textual network. And we learn the low-dimensional embeddings without labels. Our proposed NOSE owns some advantages, e.g. course-agnostic, and few parameters to tune. Experimental results suggest the learned textual representation can outperform the state-of-the-art unsupervised counterparts in the task of associating forum threads to video clips.
更多
查看译文
关键词
Unsupervised embedding, Latent similarity, Heterogeneous, MOOC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要