Utilization of Vision Transformer for Classification and Ranking of Video Distortions

Artificial Neural Networks in Pattern Recognition(2022)

引用 1|浏览3
暂无评分
摘要
The perceptual quality of video surveillance footage has impacts on several tasks involved in the surveillance process, such as the detection of anomalous objects. The videos captured by a camera are prone to various distortions such as noise, smoke, haze, low or uneven illumination, blur, rain, and compression, which affect visual quality. Automatic identification of distortions is important when enhancing video quality. Video quality assessment involves two stages: (1) classification of distortions affecting the video frames and (2) ranking of these distortions. A novel video dataset was utilized for training, validating, and testing. Working with this dataset was challenging because it included nine categories of distortions and four levels of severity. The greatest challenge was the availability of multiple types of distortions in the same video. The work presented in this paper addresses the problem of multi-label distortion classification and ranking. A vision transformer was used for feature learning. The experiment showed that the proposed solution performed well in terms of F1 score of single distortion (77.9%) and F1 score of single and multiple distortions (69.9%). Moreover, the average accuracy of level classification was 62% with an average F1 score of 61%.
更多
查看译文
关键词
Distortion classification and ranking, Multi-label classification, Video quality assessment, Vision transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要