
Grasping Hand Pose Estimation from RGB Images Using Digital Human Model by Convolutional Neural Network - 18.154
K. Ino et al., "Grasping Hand Pose Estimation from RGB Images Using Digital Human Model by Convolutional Neural Network", in Proc. of 3DBODY.TECH 2018 - 9th Int. Conf. and Exh. on 3D Body Scanning and Processing Technologies, Lugano, Switzerland, 16-17 Oct. 2018, pp. 154-160, https://doi.org/10.15221/18.154.
Title:
Grasping Hand Pose Estimation from RGB Images Using Digital Human Model by Convolutional Neural Network
Authors:
Kentaro INO 1, Naoto IENAGA 1, Yuta SUGIURA 1, Hideo SAITO 1, Natsuki MIYATA 2, Mitsunori TADA 2
1 Keio University, Yokohama, Japan;
2 Digital Human Research Group, Human Informatics Research Institute, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
Abstract:
Recently, there has been an increase in research estimating hand poses using images. Due to the hand's high degree of freedom and self-occlusion, multi-view or depth images are often used. Our objective was to estimate hand poses specifically while grasping objects. When holding something, the hand moves in many directions. However, if the camera is too distant from the hand, it may move out of range. Widening the viewing angle, however, reduces the resolution beyond usable limits. One possible solution was developed by Kashiwagi - by setting the camera on an object, the hand's pose can be estimated regardless of its position. However, Kashiwagi's method cannot be used without estimating the fingertips' positions. Recently, another method using a convolutional neural network (CNN), useful for estimating complex poses, has been proposed. Unfortunately, it is difficult to collect the large number of images with ground truth needed for learning. In this research, we focused on creating a large dataset by generating hand pose images using a digital human model and motion-captured data using DhaibaWorks. We evaluated the model by calculating the distance of the estimated pose and ground truth of the test data, which was approximately 12.3 mm on average. In comparison, the average distance in related work was 18.5 mm. We also tested our method with ordinary camera images and confirmed that it can be used in the real world. Our method provides a new means of dataset generation: annotations are done automatically with motion capture technology, which reduces the time required. In future work, we will improve the architecture of the CNN and shorten the execution time for real-time processing.
Details:
Full paper: 18154ino.pdf
Proceedings: 3DBODY.TECH 2018, 16-17 Oct. 2018, Lugano, Switzerland
Pages: 154-160
DOI: 10.15221/18.154
License/Copyright notice:
Proceedings: © Hometrica Consulting - Dr. Nicola D'Apuzzo, Switzerland, hometrica.ch.
Authors retain all rights to individual papers, which are licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The papers appearing in the proceedings reflect the author's opinions. Their inclusion in the proceedings does not necessary constitute endorsement by the editor or by the publisher.
Note: click the + on the top left of the page to open/close the menu.