3D Human Reconstrution in the Wild with Synthetic Data using Generative Models

Yongtao Ge1,2    Wenjia Wang3    Yongfan Chen2    Hao Chen2    Chunhua Shen2   
1The Univerity of Adelaide    2Zhejiang University    3The University of Hongkong   

Abstract

Despite remarkable progress has been made on the problem of 3D human pose and shape estimation (HPS), current state-of-the-art methods rely heavily on either confined indoor mocap datasets or datasets generated by rendering engines using computer graphics (CG). Both categories of datasets exhibit inadequacies in furnishing adequate human identities and authentic in-the-wild background scenes, which are crucial for accurately simulating real-world distributions. In this work, we show that synthetic data created by generative models is complementary to CG-rendered data for achieving remarkable generalization performance on diverse real-world scenes. Specifically, we propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations. The generated dataset comprises 0.79M images with corresponding 3D annotations, covering versatile viewpoints, scenes, and human identities. By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.

HumanWild from Text and Surface Normal

A woman sitting in the kitchen.
A woman in the dinging room.
A woman on the grass, distorted camera.
A man in the study.
A man in the kitchen.
A man walking on the beach, distorted camera.

HumanWild with Background Mesh

Citation

@article{ge2024humanwild,
    title={3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models},
    author={Ge, Yongtao and Wang, Wenjia, and Chen, Yongfan, and Chen, Hao and Shen, Chunhua},
    journal={arXiv preprint arXiv:2403.11111},
    year={2024}
}