Anatomical Structure-Guided Medical Vision-Language Pre-training
CoRR(2024)
Abstract
Learning medical visual representations through vision-language pre-training
has reached remarkable progress. Despite the promising performance, it still
faces challenges, i.e., local alignment lacks interpretability and clinical
relevance, and the insufficient internal and external representation learning
of image-report pairs. To address these issues, we propose an Anatomical
Structure-Guided (ASG) framework. Specifically, we parse raw reports into
triplets , and fully utilize each
element as supervision to enhance representation learning. For anatomical
region, we design an automatic anatomical region-sentence alignment paradigm in
collaboration with radiologists, considering them as the minimum semantic units
to explore fine-grained local alignment. For finding and existence, we regard
them as image tags, applying an image-tag recognition decoder to associate
image features with their respective tags within each sample and constructing
soft labels for contrastive learning to improve the semantic association of
different image-report pairs. We evaluate the proposed ASG framework on two
downstream tasks, including five public benchmarks. Experimental results
demonstrate that our method outperforms the state-of-the-art methods.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined