Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
arxiv(2024)
摘要
We introduce Metric3D v2, a geometric foundation model for zero-shot metric
depth and surface normal estimation from a single image, which is crucial for
metric 3D recovery. While depth and normal are geometrically related and highly
complimentary, they present distinct challenges. SoTA monocular depth methods
achieve zero-shot generalization by learning affine-invariant depths, which
cannot recover real-world metrics. Meanwhile, SoTA normal estimation methods
have limited zero-shot performance due to the lack of large-scale labeled data.
To tackle these issues, we propose solutions for both metric depth estimation
and surface normal estimation. For metric depth estimation, we show that the
key to a zero-shot single-view model lies in resolving the metric ambiguity
from various camera models and large-scale data training. We propose a
canonical camera space transformation module, which explicitly addresses the
ambiguity problem and can be effortlessly plugged into existing monocular
models. For surface normal estimation, we propose a joint depth-normal
optimization module to distill diverse data knowledge from metric depth,
enabling normal estimators to learn beyond normal labels. Equipped with these
modules, our depth-normal models can be stably trained with over 16 million of
images from thousands of camera models with different-type annotations,
resulting in zero-shot generalization to in-the-wild images with unseen camera
settings. Our method enables the accurate recovery of metric 3D structures on
randomly collected internet images, paving the way for plausible single-image
metrology. Our project page is at https://JUGGHM.github.io/Metric3Dv2.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要