A Cube Model and Cluster Analysis for Web Access Sessions.

WEBKDD '01: Revised Papers from the Third International Workshop on Mining Web Log Data Across All Customers Touch Points(2001)

引用 30|浏览27
暂无评分
摘要
Identification of the navigational patterns of casual visitors is an important step in online recommendation to convert casual visitors to customers in e-commerce. Clustering and sequential analysis are two primary techniques for mining navigational patterns from Web and application server logs. The characteristics of the log data and mining tasks require new data representation methods and analysis algorithms to be tested in the e-commerce environment. In this paper we present a cube model to represent Web access sessions for data mining. The cube model organizes session data into three dimensions. The COMPONENT dimension represents a session as a set of ordered components { c 1 , c 2 , ..., c P }, in which each component c i indexes the i th visited page in the session. Each component is associated with a set of attributes describing the page indexed by it, such as the page ID, category and view time spent at the page. The attributes associated with each component are defined in the ATTRIBUTE dimension. The SESSION dimension indexes individual sessions. In the model, irregular sessions are converted to a regular data structure to which existing data mining algorithms can be applied while the order of the page sequences is maintained. A rich set of page attributes is embedded in the model for different analysis purposes. We also present some experimental results of using the partitional clustering algorithm to cluster sessions. Because the sessions are essentially sequences of categories, the k -modes algorithm designed for clustering categorical data and the clustering method using the Markov transition frequency (or probability) matrix, are used to cluster categorical sequences.
更多
查看译文
关键词
casual visitor,categorical data,data mining,existing data mining algorithm,log data,new data representation method,regular data structure,page ID,page attribute,page sequence,Cluster Analysis,Cube Model,Web Access Sessions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要