A Data Parallel Algorithm for XML DOM Parsing

DATABASE AND XML TECHNOLOGIES, PROCEEDINGS(2009)

引用 20|浏览0
暂无评分
摘要
The extensible markup language XML has become the de facto standard for information representation and interchange on the Internet. XML parsing is a core operation performed on an XML document for it to be accessed and manipulated. This operation is known to cause performance bottlenecks in applications and systems that process large volumes of XML data. We believe that parallelism is a natural way to boost performance. Leveraging multicore processors can offer a cost-effective solution, because future multicore processors will support hundreds of cores, and will offer a high degree of parallelism in hardware. We propose a data parallel algorithm called ParDOM for XML DOM parsing, that builds an in-memory tree structure for an XML document. ParDOM has two phases. In the first phase, an XML document is partitioned into chunks and parsed in parallel. In the second phase, partial DOM node tree structures created during the first phase, are linked together (in parallel) to build a complete DOM node tree. ParDOM offers fine-grained parallelism by adopting a flexible chunking scheme --- each chunk can contain an arbitrary number of start and end XML tags that are not necessarily matched. ParDOM can be conveniently implemented using a data parallel programming model that supports map and sort operations. Through empirical evaluation, we show that ParDOM yields better scalability than PXP [23] --- a recently proposed parallel DOM parsing algorithm --- on commodity multicore processors. Furthermore, ParDOM can process a wide-variety of XML datasets with complex structures which PXP fails to parse.
更多
查看译文
关键词
parallel dom parsing algorithm,xml parsing,complete dom node tree,xml dom parsing,xml tag,xml data,data parallel algorithm,xml document,data parallel programming model,xml datasets,tree structure,cost effectiveness,complex structure,extensible markup language,programming model,multicore processors
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要