Data Discovery and Anomaly Detection Using Atypicality: Theory.

IEEE Transactions on Information Theory(2019)

引用 26|浏览17
暂无评分
摘要
A central question in the era of u0027big datau0027 is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this paper is the opposite, namely that most of the value in the information in some applications is in the parts that deviate from the average, that are unusual, atypical. We define what we mean by u0027atypicalu0027 in an axiomatic way as data that can be encoded with fewer bits in itself rather than using the code for the typical data. We show that this definition has good theoretical properties. We then develop an implementation based on universal source coding, and apply this to a number of real world data sets.
更多
查看译文
关键词
Anomaly detection,Encoding,Decoding,Big Data,Bioinformatics,Monitoring,Complexity theory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要