Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
CoRR(2024)
Abstract
Key-value relations are prevalent in Visually-Rich Documents (VRDs), often
depicted in distinct spatial regions accompanied by specific color and font
styles. These non-textual cues serve as important indicators that greatly
enhance human comprehension and acquisition of such relation triplets. However,
current document AI approaches often fail to consider this valuable prior
information related to visual and spatial features, resulting in suboptimal
performance, particularly when dealing with limited examples. To address this
limitation, our research focuses on few-shot relational learning, specifically
targeting the extraction of key-value relation triplets in VRDs. Given the
absence of a suitable dataset for this task, we introduce two new few-shot
benchmarks built upon existing supervised benchmark datasets. Furthermore, we
propose a variational approach that incorporates relational 2D-spatial priors
and prototypical rectification techniques. This approach aims to generate
relation representations that are more aware of the spatial context and unseen
relation in a manner similar to human perception. Experimental results
demonstrate the effectiveness of our proposed method by showcasing its ability
to outperform existing methods. This study also opens up new possibilities for
practical applications.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined