TL;DR
A comprehensive Monocular Geometry Benchmark for evaluating SOTA discriminative and generative depth and surface normal estimation foundation models. The conclusions are:
1. Discriminative Models pretrained with large data (e.g. DINOv2), can outperform generative models pretrained with Stable Diffusion with a small scale synthetic data under the same training configuration.
2. Synthetic Data is critial for fine-grained depth estimation. Data quality is a more important factor than model architectures and data scales.
3. Inductive bias is critial for surface normal estimation.
1. Discriminative Models pretrained with large data (e.g. DINOv2), can outperform generative models pretrained with Stable Diffusion with a small scale synthetic data under the same training configuration.
2. Synthetic Data is critial for fine-grained depth estimation. Data quality is a more important factor than model architectures and data scales.
3. Inductive bias is critial for surface normal estimation.
Citation
@article{ge2024geobench,
title={GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models},
author={Ge, Yongtao and Xu, Guangkai, and Zhao, Zhiyue and Huang, zheng and Sun, libo and Sun, Yanlong and Chen, Hao and Shen, Chunhua},
journal={arXiv preprint arXiv:2406.12671},
year={2024}
}