Study of Spatio-Temporal Modeling in Video Quality Assessment.

Yuming Fang,Zhaoqian Li,Jiebin Yan,Xiangjie Sui,Hantao Liu

IEEE Trans. Image Process.（2023）

引用 2|浏览28

暂无评分

摘要

Video quality assessment (VQA) has received remarkable attention recently. Most of the popular VQA models employ recurrent neural networks (RNNs) to capture the temporal quality variation of videos. However, each long-term video sequence is commonly labeled with a single quality score, with which RNNs might not be able to learn long-term quality variation well. A natural question then arises: What's the real role of RNNs in learning the visual quality of videos? Does it learn spatio-temporal representation as expected or just aggregating spatial features redundantly? In this study, we conduct a comprehensive study by training a family of VQA models with carefully designed frame sampling strategies and spatio-temporal fusion methods. Our extensive experiments on four publicly available in-the-wild video quality datasets lead to two main findings. First, the plausible spatio-temporal modeling module (i.e., RNNs) does not facilitate quality-aware spatio-temporal feature learning. Second, sparsely sampled video frames are capable of obtaining the competitive performance against using all video frames as the input. In other words, spatial features play a vital role in capturing video quality variation for VQA. To our best knowledge, this is the first work to explore the issue of spatio-temporal modeling in VQA.

查看译文

关键词

Video quality assessment, spatio-temporal modeling, recurrent neural network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要