Estimating 3D Human Skeleton from a Single RGB Image
Author Information
Author(s): Lie Wen-Nung, Vann Veasna
Primary Institution: National Chung Cheng University, Taiwan
Hypothesis
Can we accurately estimate a 3D human skeleton from a single RGB image by fusing predicted depths from multiple virtual viewpoints?
Conclusion
The proposed method achieves an average per-joint position error of 45.7 mm, outperforming several prior studies.
Supporting Evidence
- The method outperforms single-image-based methods when evaluated on the Human3.6M dataset.
- It achieves performance comparable to state-of-the-art methods that use long image sequences.
- The proposed approach simplifies the system by using a single RGB image instead of multiple cameras.
Takeaway
This study shows how we can guess the 3D shape of a person just by looking at a single picture, using smart tricks to get better depth information.
Methodology
The method uses a two-stage network with a Real-Net stream for predicting 2D coordinates and relative depths, and a Virtual-Net stream for estimating depths from virtual viewpoints, followed by a fusion module.
Limitations
The method struggles with heavy self-occlusions and complex interactions between multiple humans.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website