Performance Analysis via Hadoop: A Study of between text and.orc file extensions

Luana Thamiris Da Silva de Oliveira,Maristela Holanda,Márcio de Carvalho Victorino

2023 18th Iberian Conference on Information Systems and Technologies (CISTI)(2023)

引用 0|浏览0
暂无评分
摘要
In the age of Big Data, access to open data contributes to increasingly in-depth studies on different themes, whether social, economic, or political. A practical example is the analysis on the distribution of the Emergency Aid in Brazil, approved by the National Congress in 2020, due to the pandemic of the new coronavirus. However, as will be seen throughout the text, there are challenges to processing massive databases. Tools capable of processing a large volume of data in a short period of time are increasingly needed. Apache Hadoop is an important framework that allows distributed processing of massive databases, reducing query times. To elucidate this issue, this paper presents a comparison between two file extensions (.txt and.ORC) and checks their performance in processing Hive queries on the Emergency Relief database. The results showed significant differences in average execution time when adopting the.ORC file format. On the other hand, what was thought to be discrepant, the average value of the aid, was not proven, since all States received similar values.
更多
查看译文
关键词
Massive Database,Hadoop,Hive,Emergency Aid,open data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要