Adaptive Patching for High-resolution Image Segmentation with Transformers
arxiv(2024)
摘要
Attention-based models are proliferating in the space of image analytics,
including segmentation. The standard method of feeding images to transformer
encoders is to divide the images into patches and then feed the patches to the
model as a linear sequence of tokens. For high-resolution images, e.g.
microscopic pathology images, the quadratic compute and memory cost prohibits
the use of an attention-based model, if we are to use smaller patch sizes that
are favorable in segmentation. The solution is to either use custom complex
multi-resolution models or approximate attention schemes. We take inspiration
from Adapative Mesh Refinement (AMR) methods in HPC by adaptively patching the
images, as a pre-processing step, based on the image details to reduce the
number of patches being fed to the model, by orders of magnitude. This method
has a negligible overhead, and works seamlessly with any attention-based model,
i.e. it is a pre-processing step that can be adopted by any attention-based
model without friction. We demonstrate superior segmentation quality over SoTA
segmentation models for real-world pathology datasets while gaining a geomean
speedup of 6.9× for resolutions up to 64K^2, on up to 2,048 GPUs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要