Answering Questions About Data Visualizations Using Efficient Bimodal Fusion

Kafle Kushal,Shrestha Robik,Brian L. Price,Cohen Scott,Kanan Christopher

2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)（2020）

引用 45|浏览171

暂无评分

摘要

Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. CQA requires capabilities that natural-image VQA algorithms lack: fine-grained measurements, optical character recognition, and handling out-of-vocabulary words in both questions and answers. Without modifications, state-of-the-art VQA algorithms perform poorly on this task. Here, we propose a novel CQA algorithm called parallel recurrent fusion of image and language (PReFIL). PReFIL first learns bimodal embeddings by fusing question and image features and then intelligently aggregates these learned embeddings to answer the given question. Despite its simplicity, PReFIL greatly surpasses state-of-the art systems and human baselines on both the FigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can be used to reconstruct tables by asking a series of questions about a chart.

查看译文

关键词

data visualizations,optical character recognition,CQA algorithm,PReFIL,image features,bimodal fusion,chart question answering,visual question answering,natural-image VQA algorithms,parallel recurrent fusion of image and language,question features,document text

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要