Identification of Bursts in a Document Stream
msra
摘要
We propose a method for extracting 'burst of a word' relating to a popular topic in a document stream in which we do not assume that document arrives at a uniform rate. We regard blogs and BBSs as document streams to apply the method originally proposed by Kleinberg. However, since Kleinberg's algorithm cannot be applied to document streams where the distribution of documents is not uniform, we extended the method to be able to apply to blogs and BBSs. We also describe experiments for blog and BBS with our method and discuss the results. to identify useful information from these infor- mation sources, and a technique for automatically extracting useful information is needed. In previous research, Wakefield attempted to predict stock prices from reputations on BBSs(1), and Matsuo tried to summarize replies on BBSs to ob- tain useful information using the reply-replied structure of BBS(2). The burst detection algorithm proposed by Kleinberg(3) can be used to ex- tract a topic word in a certain period of time from these information sources. This algorithm regards these information sources as document streams (i.e., a set of documents with time information), and models the phenomenon that the frequency of a specific word increases rapidly when a topic attracts attention. Though Kleinberg originally applied the algorithm to identifying events from sets of e-mails, the algorithm can be used on dierent kinds of document streams. Kumar applied the method to hyperlinks between blog communities and ex- tracted bursty evolution of communities(4), while Mane applied it to scientific publications to generate maps of major research topics(5). Originally, we tried to apply the algorithm to blogs that had been collected by our blog collecting system(6), however, it did not work properly. This may have been because Kleinberg's algorithm is based on the assumption that documents appear uniformly, while the distribution of blog entries collected by our system is not necessarily uniform. Therefore, we present our method that is an extension of Kleinberg's method to deal with blogs and BBSs which do not exhibit uniform distribution. We also describe the experiments for blogs and BBSs with our proposed method and discuss the results.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络