Supervised Human-Guided Data Exploration.

PKDD/ECML Workshops(2019)

引用 0|浏览24
暂无评分
摘要
An exploratory data analysis system should be aware of what a user already knows and what the user wants to know of the data. Otherwise it is impossible to provide the user with truly informative and useful views of the data. In our recently introduced framework for human-guided data exploration (Puolamaki et al. [20]), both the user's knowledge and objectives are modelled as distributions over data, parametrised by tile constraints. This makes it possible to show the users the most informative views given their current knowledge and objectives. Often the data, however, comes with a class label and the user is interested only of the features informative related to the class. In non-interactive settings there exist dimensionality reduction methods, such as supervised PCA (Barshan et al. [1]), to make such visualisations, but no such method takes the user's knowledge or objectives into account. Here, we formulate an information criterion for supervised human-guided data exploration to find the most informative views about the class structure of the data by taking both the user's current knowledge and objectives into account. We study experimentally the scalability of our method for interactive use, and stability with respect to the size of the class of interest. We show that our method gives understandable and useful results when analysing real-world datasets, and a comparison to SPCA demonstrates the effect of the user's background knowledge. The implementation will be released as an open source software library.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要