Parrot: A Progressive Analysis System on Large Text Collections

DATA SCIENCE AND ENGINEERING(2020)

引用 8|浏览59
暂无评分
摘要
The size of textual data continues to grow along with the need for timely and cost-effective analysis, while the growth of computation power cannot keep up with the growth of data. The delays when processing huge textual data can negatively impact user activity and insight. This calls for a paradigm shift from blocking fashion to progressive processing. In this paper, we propose a sample-based progressive processing model that focuses on term frequency calculation on text. The model is based on an incremental execution engine and will calculate a series of approximate results for a single query in a progressive way to provide a smooth trade-off between accuracy and latency. As a part, we proposed a new variant of the bootstrap technique to quantify result error progressively. We implemented this method in our system called Parrot on top of Apache Spark and used real-world data to test its performance. Experiments demonstrate that our method is 2.4×–19.7× faster to get a result within 1% error while the confidence interval always covers the accurate results very well.
更多
查看译文
关键词
Approximate query processing,Text data analytics,Term frequency,Bootstrap
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要