Audio-Visual Tracking of Multiple Speakers Via a PMBM Filter.

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)

引用 7|浏览23
暂无评分
摘要
Audio-visual tracking of multiple speakers requires to estimate the state (e.g. velocity and location) of each speaker by leveraging the information of both audio and visual modalities. Estimating the number of speakers and their states jointly remains a challenging problem. We propose an Audio-Visual Possion Multi-Bernoulli Mixture Filter (AV-PMBM) that can not only predict the number of speakers but also give accurate estimation of their states. We also propose a novel sound source localization technique based on DOA information and a deep learning based object detector to provide reliable audio measurements for the AV tracker. To our knowledge, this represents the first attempt using PMBM for multi-speaker tracking with audio visual modalities. Experiments on the AV16.3 dataset demonstrate that AV-PMBM achieves state-of-the-art performance in optimal sub-pattern assignment (OSPA).
更多
查看译文
关键词
multiple-speaker tracking,audio-visual fusion,PMBM filter
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要