Content Representation and Similarity of Movies based on Topic Extraction from Subtitles.

SETN(2016)

引用 11|浏览3
暂无评分
摘要
In this paper we examine the existence of correlation between movie content similarity and low level textual features from respective subtitles. In addition, we demonstrate the extraction of topical representation of movies based on subtitles mining. Using natural language processing and a topic modeling algorithm, namely Latent Dirichlet Allocation, applied on the movie subtitles, we extract the latent topic structure of a set of movies. In order to demonstrate the proposed content representation approach, we have built a dataset of 160 widely known movies, represented by their corresponding subtitles. After evaluating the resulting topics' quality and coherence, we move on to assert movie similarities, exploiting their distances in the topic populated space. Finally, using those topic-space projections of the movies, we aspire to create a topic model browser for movies, allowing us to explore the different aspects of similarities between movies and discover latent knowledge regarding the movies through the association of low-level topic links and high level movie similarities.
更多
查看译文
关键词
Subtitles Processing, Topic Modeling, Latent Dirichlet Allocation, Movies Similarity, Text Mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要