Occamy: Memory-efficient GPU Compiler for DNN Inference.

DAC(2023)

引用 0|浏览7
暂无评分
摘要
This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy. For each DNN operation, Occamy analyzes the dimensions of input and output tensors, and their liveness within the operation. Across all the operations, Occamy analyzes liveness of all the tensors, generates a memory pool after calculating the maximum required memory size, and schedules when and where to place each tensor in the memory pool. Compared to PyTorch, on an integrated embedded GPU for six DNNs, Occamy reduces the memory usage by 34.6% and achieves a geometric mean speedup of 1.25x.
更多
查看译文
关键词
DNN inference,DNN model,DNN operation,integrated embedded GPU,maximum required memory size,memory pool,memory usage,memory-efficient DNN compiler,memory-efficient GPU compiler,Occamy analyzes liveness,output tensors,tensor
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要