DAI-NET: Toward communication-aware collaborative training for the industrial edge

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE(2024)

引用 0|浏览0
暂无评分
摘要
The industrial edge generates an abundance of spatially distributed and dynamic data that needs to remain on -site for privacy and security reasons. Collaborative training at the edge can leverage this data to refine pre -trained models locally for specific industrial tasks and environments and have them adapt to local changes for enhanced performance, agility, and resilience. However, communication between the devices during training is a key bottleneck and is not modelled by existing frameworks such as MxNet, PyTorch and TensorFlow. This paper introduces DAI-NET, a co -simulation framework for examining communication and its associated costs, and provides results from an implementation using Python, OMNET++ and INET. To validate it and showcase its utility, the developed platform is applied in the analysis of (i) the performance and cost of collaboratively training a Multilayer Perceptron model, and (ii) the influence of computational heterogeneity. Communication costs generated during the training are captured at the device and system levels. In computationally heterogeneous clusters, the root cause of stragglers is exposed. In addition, the key performance contributors are identified to be a cluster's computation capability and the variation in the relative computation capabilities of its devices. This study is particularly useful for Artificial Intelligence of Things (AIoT) systems, whose bandwidth and energy resources are limited. It lends the way for more practical research on communication -efficient algorithms, network protocols and architectures for the AIoT edge.
更多
查看译文
关键词
Co-simulation,Distributed training,Efficient communications and networking,Edge AI/ML,Artificial Intelligence of Things (AIoT)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要