Beyond Set Disjointness: The Communication Complexity Of Finding The Intersection
PODC(2014)
摘要
We consider the following fundamental communication problem - there is data that is distributed among servers, and the servers want to compute the intersection of their data sets, e.g., the common records in a relational database. They want to do this with as little communication and as few messages (rounds) as possible. They are willing to use randomization, and fail with a tiny probability. Given a protocol for computing the intersection, it can also be used to compute the exact Jaccard similarity, the rarity, the number of distinct elements, and joins between databases. Computing the intersection is at least as hard as the set disjointness problem, which asks whether the intersection is empty.Formally, in the two-server setting, the players hold subsets S; T subset of [n]. In many realistic scenarios, the sizes of S and T are significantly smaller than n, so we impose the constraint that vertical bar S vertical bar; vertical bar T vertical bar <= k. We study the minimum number of bits the parties need to communicate in order to compute the intersection set S boolean AND T, given a certain number r of messages that are allowed to be exchanged. While O(k log(n=k)) bits is achieved trivially and deterministically with a single message, we ask what is possible with more than one message and with randomization. We give a smooth communication/round tradeoff which shows that with O(log*k) rounds, O (k) bits of communication is possible, which improves upon the trivial protocol by an order of magnitude. This is in contrast to other basic problems such as computing the union or symmetric difference, for which Omega(k log(n=k)) bits of communication is required for any number of rounds. For two players, known lower bounds for the easier problem of set disjointness imply our algorithms are optimal up to constant factors in communication and number of rounds. We extend our protocols to m-player protocols, obtaining an optimal O(mk) bits of communication with a similarly small number of rounds.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络