AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We presented a system for identifying trends in text documents collected over a period of time

Discovering Trends in Text Databases

KDD, pp.227-230, (1997)

被引用303|浏览11
EI
下载 PDF 全文
引用
微博一下
关键词

摘要

We describe a system we developed for identifyingtrends in text documents collected over a period oftime. Trends can be used, for example, to discoverthat a company is shifting interests from one domainto another. Our system uses several data mining techniquesin novel ways and demonstrates a method inwhich to visualize the trends. We als...更多

代码

数据

0
简介
  • The authors address the problem of discovering trends in text databases.
  • Frequency of occurrence of the phrase, obtained by partitioning the documents based upon their timestamps.
  • (Other measures of frequency are possible, e.g. counting each occurrence of the phrase in a document.) A trend is a specific subsequence of the history of a phrase that satisfies the users’ query over the histories.
  • The user may specify a “spike” query to finds those phrases whose frequency of occurrence increased and decreased
重点内容
  • We address the problem of discovering trends in text databases
  • We propose the use of a shape definition language called SVC (Agrawal et al 1995) to define the users’ queries and retrieve the associated objects
  • The PatentMiner prototype is a system we developed to discover trends among patents granted in different categories
  • We presented a system for identifying trends in text documents collected over a period of time
  • We described our experience in applying this system to the IBM Patent Server, a database of U.S patents
  • 230 KDD-97 ments show that our system, PatentMiner, scales approximately linearly with the number documents
结论
  • The authors presented a system for identifying trends in text documents collected over a period of time.
  • The authors' system uses several data mining techniques such as sequential patterns and shape queries in novel ways and demonstrates a trend visualization method.
  • The authors described the experience in applying this system to the IBM Patent Server, a database of U.S patents.
  • Scaleup experi-.
  • 230 KDD-97 ments show that the system, PatentMiner, scales approximately linearly with the number documents
总结
  • Introduction:

    The authors address the problem of discovering trends in text databases.
  • Frequency of occurrence of the phrase, obtained by partitioning the documents based upon their timestamps.
  • (Other measures of frequency are possible, e.g. counting each occurrence of the phrase in a document.) A trend is a specific subsequence of the history of a phrase that satisfies the users’ query over the histories.
  • The user may specify a “spike” query to finds those phrases whose frequency of occurrence increased and decreased
  • Conclusion:

    The authors presented a system for identifying trends in text documents collected over a period of time.
  • The authors' system uses several data mining techniques such as sequential patterns and shape queries in novel ways and demonstrates a trend visualization method.
  • The authors described the experience in applying this system to the IBM Patent Server, a database of U.S patents.
  • Scaleup experi-.
  • 230 KDD-97 ments show that the system, PatentMiner, scales approximately linearly with the number documents
相关工作
  • An approach to discovering interesting patterns and t(Irnreenr1ad1maannal&ysisD on agan te1x9t95d).ocmounml-e-enLbt-se-x-Icwi1sasCnr-sp-Lbresa--en-n-nLtoe-bld-a.3beina with a set of concepts, organized as a hierarchy. Treating the concept hierarchy as a distribution of probabilities, they identify several model distributions distribution) to which a given concept hierarchy can be compared. Interesting concepts are those that differ from their model distribution. Analyzing trends involves the comparison of concept distributions using old data with distributions using new data.

    In (Feldman & Hirsh 1996), the authors find associations between the keywords or concepts labeling the documents using background knowledge about relationships among the keywords. The purpose of the knowledge base is to supply unary or binary relations amongst the keywords labeling the documents.
引用论文
  • Agrawal, R.; Psaila, G.; Wimmers, E.; and Zait, M. 1995. Querying shapes of histories. In Proceedings of the 2lst International Conference on Very Large Databases.
    Google ScholarLocate open access versionFindings
  • Croft, W.; Turtle, H.; and Lewis, D. 1991. The use of phrases and structured queries in information retrieval. In i&h International ACM SIGIR Conference on Research and Development in Information Retrieval, 32-45.
    Google ScholarLocate open access versionFindings
  • Deerwester, S.; Dumais, S. T.; Furnas, G. W.; Landauer, T. K.; and Harshman, R. 1990. Indexing by latent semantic analysis. Journal of the American Society for Injormation Science 41(6):391-407.
    Google ScholarLocate open access versionFindings
  • Feldman, R., and Dagan, I. 1995. Knowledge discovery in textual databases (KDT). In Proceedings 0s iIre 1st InternationaE Conference on Knowledge Discovery in Databases and Data Mining.
    Google ScholarLocate open access versionFindings
  • Feldman, R., and Hirsh, H. 1996. Mining associations in text in the presence of background knowledge. In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining.
    Google ScholarLocate open access versionFindings
  • Gay, L., and Croft, W. 1990. Interpreting nominal compounds for information retrieval. Information Processing and Management 26(1):21-38.
    Google ScholarLocate open access versionFindings
  • Lewis, D., and Croft, W. 1990. Term clustering of syntactic phrases. In 13th International ACM SIGIR Conference on Research and Development in Information Retrieval, 385-404.
    Google ScholarLocate open access versionFindings
  • Renouf, A. 1993a. Making sense of text: automated approaches to meaning extraction. 17th International Online Information Meeting Proceedings 77-86.
    Google ScholarLocate open access versionFindings
  • Renouf, A. 1993b. What the linguist has to say to the information scientist. Journal of Document and Text Management 1(2):173-190.
    Google ScholarLocate open access versionFindings
  • Salton, G.; Allan, J.; Buckley, C.; and Singhal, A. 1994. Automatic analysis, theme generation, and summarization of machine readable texts. SCIENC’E 264(5164):1421-1426.
    Google ScholarLocate open access versionFindings
  • Salton, G.; Singhal, A.; Buckley, C.; and Mitra, M. 1996. rh~rurt.num”IInIwt;”PIcI+.J.t0i*xu-tu.-.ldct.io”rnrmr~n“n“rnu;ri”;rrmr,U,n.x;nLr’r~ tYcUwnt Uvauc.em- ments and text themes. In Proceedings of Hypertext, 53-65.
    Google ScholarLocate open access versionFindings
  • Srikant, R., and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the Fifth International Conference on Extending Database Technology (EDBT).
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
小科