UDON: A case for offloading to general purpose compute on CXL memory
arxiv(2024)
摘要
Upcoming CXL-based disaggregated memory devices feature special purpose units
to offload compute to near-memory. In this paper, we explore opportunities for
offloading compute to general purpose cores on CXL memory devices, thereby
enabling a greater utility and diversity of offload.
We study two classes of popular memory intensive applications: ML inference
and vector database as candidates for computational offload. The study uses Arm
AArch64-based dual-socket NUMA systems to emulate CXL type-2 devices.
Our study shows promising results. With our ML inference model partitioning
strategy for compute offload, we can place up to 90
just 20
(HNSW) kernels in vector databases can provide upto 6.87× performance
improvement with under 10
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要