Abstract—Hand gesture recognition topic has been researched for many recent decades because it could be used in many fields as sign language, virtual game, human-robot interaction, entertainment and so on. However, this problem has been faced to many challenges such as combination of multi information in a temporal flow in order to understand the meaning of human hand gesture. In recent times, thanks to the advances in hardware technologies such as readily available 3D cameras, Kinect sensors, and etc. The impressive performance of cutting-edge techniques in computer vision, which is known as: manifold learning, deep learning techniques and/or the presentation of various multimodal fusion strategies. There have been many improvements in exploiting of features from multimodal data to effectively solve human hand gesture recognition tasks. Therefore, this paper focuses on solving the problem of dynamic hand gesture recognition in our daily life. We consider methods for extracting features of different data sources (RGB images and depth images) based on both manifold learning and deep learning technique. For RGB information, a manifold technique is performed to extract spatial feature that is then composed with temporal feature extracted by KLT technique. Among many deep learning architectures proposed in the literature that achieved good results in detecting human activities, I studied and proposed a simple convolutional neural network to extract feature of depth motion map. This technique extracts hand features from depth information which combines spatial and temporal aspects. Besides that, fusion algorithms are deployed to unite with those extracted features and enhance the accuracy of a final dynamic hand gesture results. Evaluation results confirm that the best accuracy rate achieves at 84.7% that is significantly higher than results from previous works (at 78.4%). The proposed method suggests a feasible solution addressing technical issues in using multimodality and multi-viewpoint of hand gestures.
Index Terms—Hand gesture recognition, convolutional neuron network, RGB-Depth images, multi-modalities, multi-viewpoints, depth motion map, manifold learning.
Huong-Giang Doan is with the Control and Automation Faculty at Electrical Power University Hanoi, Vietnam (e-mail: firstname.lastname@example.org).
Van-Toi Nguyen is with the Faculty of Information Technology, Posts and Telecommunications Institute of Technology, Vietnam (e-mail: email@example.com)
Cite: Huong-Giang Doan and Van-Toi Nguyen, "Improving Dynamic Hand Gesture Recognition on Multi-views with Multi-modalities," International Journal of Machine Learning and Computing vol. 9, no. 6, pp. 795-800, 2019.Copyright © 2019 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).