AgileFormer: Spatially Agile Transformer UNet for Medical Image Segmentation
arxiv(2024)
摘要
In the past decades, deep neural networks, particularly convolutional neural
networks, have achieved state-of-the-art performance in a variety of medical
image segmentation tasks. Recently, the introduction of the vision transformer
(ViT) has significantly altered the landscape of deep segmentation models.
There has been a growing focus on ViTs, driven by their excellent performance
and scalability. However, we argue that the current design of the vision
transformer-based UNet (ViT-UNet) segmentation models may not effectively
handle the heterogeneous appearance (e.g., varying shapes and sizes) of objects
of interest in medical image segmentation tasks. To tackle this challenge, we
present a structured approach to introduce spatially dynamic components to the
ViT-UNet. This adaptation enables the model to effectively capture features of
target objects with diverse appearances. This is achieved by three main
components: \textbf{(i)} deformable patch embedding; \textbf{(ii)} spatially
dynamic multi-head attention; \textbf{(iii)} deformable positional encoding.
These components were integrated into a novel architecture, termed AgileFormer.
AgileFormer is a spatially agile ViT-UNet designed for medical image
segmentation. Experiments in three segmentation tasks using publicly available
datasets demonstrated the effectiveness of the proposed method. The code is
available at
\href{https://github.com/sotiraslab/AgileFormer}{https://github.com/sotiraslab/AgileFormer}.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要