Efficient per-shot transformer-based bitrate ladder prediction for adaptive video streaming

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP(2023)

引用 0|浏览4
暂无评分
摘要
Recently, HTTP adaptive streaming (HAS) has become a standard approach for over-the-top (OTT)-based video streaming services due to its ability to provide smooth streaming. In HAS, stream representations are encoded to target a specific bitrate providing a wide range of operating bitrates known as the bitrate ladder. In the past, a fixed bitrate ladder approach for all videos has been widely used. However, such a method does not consider video content, which can vary considerably in motion, texture, and scene complexity. Moreover, building a per-title bitrate ladder based on an exhaustive encoding is quite expensive due to the large encoding parameter space. Thus, alternative solutions allowing accurate and efficient per-title bitrate ladder prediction are in great demand. On the other hand, self-attention-based architectures have achieved tremendous performance in large language models (LLMs) and particularly vision transformers (ViTs) in computer vision tasks. Therefore, this paper investigates ViT's capabilities in building an efficient bitrate ladder without performing any encoding process. We provide the first in-depth analysis of the prediction accuracy and the complexity overhead induced by the ViTs model in predicting the bitrate ladder on a large and diverse video dataset. The source code of the proposed solution and the dataset will be made publicly available.
更多
查看译文
关键词
Bitrate ladder,video compression,HEVC,vision transformer,adaptive video streaming
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要