Estimating a 3D Human Skeleton from a Single RGB Image by Fusing Predicted Depths from Multiple Virtual Viewpoints

2024

Estimating 3D Human Skeleton from a Single RGB Image

publication Evidence: high

Author Information

Author(s): Lie Wen-Nung, Vann Veasna

Primary Institution: National Chung Cheng University, Taiwan

Hypothesis

Can we accurately estimate a 3D human skeleton from a single RGB image by fusing predicted depths from multiple virtual viewpoints?

Conclusion

The proposed method achieves an average per-joint position error of 45.7 mm, outperforming several prior studies.

Supporting Evidence

The method outperforms single-image-based methods when evaluated on the Human3.6M dataset.
It achieves performance comparable to state-of-the-art methods that use long image sequences.
The proposed approach simplifies the system by using a single RGB image instead of multiple cameras.

Takeaway

This study shows how we can guess the 3D shape of a person just by looking at a single picture, using smart tricks to get better depth information.

Methodology

The method uses a two-stage network with a Real-Net stream for predicting 2D coordinates and relative depths, and a Virtual-Net stream for estimating depths from virtual viewpoints, followed by a fusion module.

Limitations

The method struggles with heavy self-occlusions and complex interactions between multiple humans.

Digital Object Identifier (DOI)

10.3390/s24248017

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home