Towards exploratory hypothesis testing and analysis

Guimei Liu,Mengling Feng,Yue Wang,Limsoon Wong,See-Kiong Ng,Tzia Liang Mah,Edmund Jon Deoon Lee

Data Engineering（2011）

引用 24|浏览3

暂无评分

摘要

Hypothesis testing is a well-established tool for scientific discovery. Conventional hypothesis testing is carried out in a hypothesis-driven manner. A scientist must first formulate a hypothesis based on his/her knowledge and experience, and then devise a variety of experiments to test it. Given the rapid growth of data, it has become virtually impossible for a person to manually inspect all the data to find all the interesting hypotheses for testing. In this paper, we propose and develop a data-driven system for automatic hypothesis testing and analysis. We define a hypothesis as a comparison between two or more sub-populations. We find sub-populations for comparison using frequent pattern mining techniques and then pair them up for statistical testing. We also generate additional information for further analysis of the hypotheses that are deemed significant. We conducted a set of experiments to show the efficiency of the proposed algorithms, and the usefulness of the generated hypotheses. The results show that our system can help users (1) identify significant hypotheses; (2) isolate the reasons behind significant hypotheses; and (3) find confounding factors that form Simpson's Paradoxes with discovered significant hypotheses.

查看译文

关键词

statistical testing,data-driven system,hypothesis testing,automatic hypothesis testing,towards exploratory hypothesis testing,simpson paradox,conventional hypothesis testing,frequent pattern mining techniques,additional information,confounding factor,data mining,exploratory hypothesis testing,exploratory hypothesis analysis,scientific discovery,interesting hypothesis,significant hypothesis,form simpson,hypothesis test,probability,space exploration

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要