Classification Of Turkish Documents Using Paragraph Vector
2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP)(2018)
摘要
Text processing and mining gained a lot of traction recently due to rising interest in integration of Natural Language Processing with data analytics algorithms, in particular Deep Learning Models. In this study, newspaper columnists are classified according to vector models created by their posts. Hence, we may not only be able to determine an unclassified post's author, but also author profiles can be formed by grouping similar styles together. DeepLearning4J Java library and Doc2Vec class are mainly the preferred deep learning solutions for text mining. The vector models of 5, 10, 15, and 20 authors were created from 20k corner posts. Two particular implementations, Distributed Memory (PV-DM) and Distributed Bag of Words (PV-DBOW) models were adapted and their performances are compared. According to the results, it is seen that some authors are clearly distinguished from other authors. Such a model can be used for author profile extraction, plagiarism detection and identifying which author styles are similar.
更多查看译文
关键词
PV-DBOW, PV-DM, DL4J, Paragraph Vectors, word2Vec, doc2Vec, text mining, author profile identification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络