Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
arxiv(2023)
摘要
Streaming neural network models for fast frame-wise responses to various
speech and sensory signals are widely adopted on resource-constrained
platforms. Hence, increasing the learning capacity of such streaming models
(i.e., by adding more parameters) to improve the predictive power may not be
viable for real-world tasks. In this work, we propose a new loss, Streaming
Anchor Loss (SAL), to better utilize the given learning capacity by encouraging
the model to learn more from essential frames. More specifically, our SAL and
its focal variations dynamically modulate the frame-wise cross entropy loss
based on the importance of the corresponding frames so that a higher loss
penalty is assigned for frames within the temporal proximity of semantically
critical events. Therefore, our loss ensures that the model training focuses on
predicting the relatively rare but task-relevant frames. Experimental results
with standard lightweight convolutional and recurrent streaming networks on
three different speech based detection tasks demonstrate that SAL enables the
model to learn the overall task more effectively with improved accuracy and
latency, without any additional data, model parameters, or architectural
changes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要