AVOT - Audio-Visual Object Tracking of Multiple Objects for Robotics.

ICRA(2020)

引用 15|浏览75
暂无评分
摘要
Existing state-of-the-art object tracking can run into challenges when objects collide, occlude, or come close to one another. These visually based trackers may also fail to differentiate between objects with the same appearance but different materials. Existing methods may stop tracking or incorrectly start tracking another object. These failures are uneasy for trackers to recover from since they often use results from previous frames. By using audio of the impact sounds from object collisions, rolling, etc., our audio-visual object tracking (AVOT) neural network can reduce tracking error and drift. We train AVOT end to end and use audio-visual inputs over all frames. Our audio-based technique may be used in conjunction with other neural networks to augment visually based object detection and tracking methods. We evaluate its runtime frames-per-second (FPS) performance and intersection over union (IoU) performance against OpenCV object tracking implementations and a deep learning method. Our experiments, using the synthetic Sound-20K audio-visual dataset, demonstrate that AVOT outperforms single-modality deep learning methods, when there is audio from object collisions. A proposed scheduler network to switch between AVOT and other methods based on audio onset maximizes accuracy and performance over all frames in multimodal object tracking.
更多
查看译文
关键词
multiple objects,visually based trackers,object collisions,audio-visual object tracking neural network,tracking error,AVOT end,audio-visual inputs,visually based object detection,tracking methods,OpenCV object tracking implementations,deep learning method,audio-visual dataset,single-modality deep learning methods,audio onset,multimodal object tracking,state-of-the-art object tracking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要