Learning and testing junta distributions over hypercubes

user-5f1696ff4c775ed682f5929f(2015)

引用 0|浏览22
暂无评分
摘要
Many tasks related to the analysis of high-dimensional datasets can be formalized as problems involving learning or testing properties of distributions over a high-dimensional domain. In this work, we initiate the study of the following general question: when many of the dimensions of the distribution correspond to "irrelevant" features in the associated dataset, can we learn the distribution efficiently? We formalize this question with the notion of junta distribution. The distribution D over {0, 1}n is a k-junta distribution if the probability mass function p of D is a k-junta-- i. e., if there is a set J [subset][n] of at most k coordinates such that for every x [set membership] {0, 1}7, the value of p(x) is completely determined by the value of x on the coordinates in J. We show that it is possible to learn k-junta distributions with a number of samples that depends only logarithmically on the total number n of dimensions. We give two proofs of this result; one using the cover method and one by developing a Fourier-based learning algorithm inspired by the Low-Degree Algorithm of Linial, Mansour, and Nisan (1993). We also consider the problem of testing whether an unknown distribution is a k-junta distribution. We introduce an algorithm for this task with sample complexity Õ(2n/²k⁴) and show that this bound is nearly optimal for constant values of k. As a byproduct of the analysis of the algorithm, we obtain an optimal bound on the number of samples required to test a weighted collection of distribution for uniformity. Finally, we establish the sample complexity for learning and testing other classes of distributions related to junta-distributions. Notably, we show that the task …
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要