Towards In-Vehicle Multi-Task Facial Attribute Recognition: Investigating Synthetic Data and Vision Foundation Models
CoRR(2024)
摘要
In the burgeoning field of intelligent transportation systems, enhancing
vehicle-driver interaction through facial attribute recognition, such as facial
expression, eye gaze, age, etc., is of paramount importance for safety,
personalization, and overall user experience. However, the scarcity of
comprehensive large-scale, real-world datasets poses a significant challenge
for training robust multi-task models. Existing literature often overlooks the
potential of synthetic datasets and the comparative efficacy of
state-of-the-art vision foundation models in such constrained settings. This
paper addresses these gaps by investigating the utility of synthetic datasets
for training complex multi-task models that recognize facial attributes of
passengers of a vehicle, such as gaze plane, age, and facial expression.
Utilizing transfer learning techniques with both pre-trained Vision Transformer
(ViT) and Residual Network (ResNet) models, we explore various training and
adaptation methods to optimize performance, particularly when data availability
is limited. We provide extensive post-evaluation analysis, investigating the
effects of synthetic data distributions on model performance in in-distribution
data and out-of-distribution inference. Our study unveils counter-intuitive
findings, notably the superior performance of ResNet over ViTs in our specific
multi-task context, which is attributed to the mismatch in model complexity
relative to task complexity. Our results highlight the challenges and
opportunities for enhancing the use of synthetic data and vision foundation
models in practical applications.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要