Multi-Human Mesh Recovery with Transformers
CoRR(2024)
摘要
Conventional approaches to human mesh recovery predominantly employ a
region-based strategy. This involves initially cropping out a human-centered
region as a preprocessing step, with subsequent modeling focused on this
zoomed-in image. While effective for single figures, this pipeline poses
challenges when dealing with images featuring multiple individuals, as
different people are processed separately, often leading to inaccuracies in
relative positioning. Despite the advantages of adopting a whole-image-based
approach to address this limitation, early efforts in this direction have
fallen short in performance compared to recent region-based methods. In this
work, we advocate for this under-explored area of modeling all people at once,
emphasizing its potential for improved accuracy in multi-person scenarios
through considering all individuals simultaneously and leveraging the overall
context and interactions. We introduce a new model with a streamlined
transformer-based design, featuring three critical design choices: multi-scale
feature incorporation, focused attention mechanisms, and relative joint
supervision. Our proposed model demonstrates a significant performance
improvement, surpassing state-of-the-art region-based and whole-image-based
methods on various benchmarks involving multiple individuals.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要