Approximate Distributed Joins in Apache Spark
arXiv: Distributed, Parallel, and Cluster Computing, Volume abs/1805.05874, 2018.
The join operation is a fundamental building block of parallel data processing. Unfortunately, it is very resource-intensive to compute an equi-join across massive datasets. The approximate computing paradigm allows users to trade accuracy and latency for expensive data processing operations. The equi-join operator is thus a natural candi...More
Full Text (Upload PDF)
PPT (Upload PPT)