Deep Learning for Vietnamese Sign Language Recognition in Video Sequence

Home > Archive > 2019 > Volume 9 Number 4 (Aug. 2019) >

IJMLC 2019 Vol.9(4): 440-445 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2019.9.4.823

Anh H. Vo, Van-Huy. Pham, and Bao T. Nguyen

Abstract—With most of Vietnamese hearing impaired individuals, Vietnamese Sign Language (VSL) is the only choice for communication. Thus, there are more and more study about the automatic translation of VSL to make a bridge between hearing impaired people and normal ones. However, automatic VSL recognition in video brings many challenges due to the orientation of camera, hand position and movement, inter hand relation, etc. In this paper, we present some feature extraction approaches for VSL recognition including spatial and scene-based features. Instead of relying on a static image, we specifically capture motion information between frames in a video sequence. For the recognition task, beside the traditional method of sign language recognition such as SVM, we additionally propose to use deep learning technique for VSL recognition for finding the dependence of each frame in video sequences. We collected two VSL datasets of the relative family topic (VSL-WRF) like father, mother, uncle, aunt.... The first one includes 12 words in Vietnamese language which only have a little change between frames. While the second one contains 15 with gestures involving the relative position of the body parts and orientation of the motion. Moreover, the data augmentation technique is proposed to gain more information of hand movement and hand position. The experiments achieved the satisfactory results with accuracy of 88.5% (traditional SVM) and 95.83% (deep learning). It indicates that deep learning combining with data augmentation technique provides more information about the orientation or movement of hand, and it would be able to improve the performance of VSL recognition system.

Index Terms—Vietnamese sign language (VSL), VSL recognition, local descriptors, spatial feature, scene-based feature, Motion-based feature, deep learning.

Anh H. Vo and Van-Huy. Pham are with the Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam (e-mail: vohoanganh@tdtu.edu.vn, phamvanhuy@tdtu.edu.vn).
Bao T. Nguyen is with the Faculty of Information Technology, University of Education and Technology, Ho Chi Minh City, Vietnam (e-mail: baont@hcmute.edu.vn).

[PDF]

Cite: Anh H. Vo, Van-Huy. Pham, and Bao T. Nguyen, "Deep Learning for Vietnamese Sign Language Recognition in Video Sequence," International Journal of Machine Learning and Computing vol. 9, no. 4, pp. 440-445, 2019.

Copyright © 2019 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

PREVIOUS PAPER

Deep Predictive Neural Network: Unsupervised Learning for Hand Pose Estimation

NEXT PAPER

WADA-W: A Modified WADA SNR Estimator for Audio-Visual Speech Recognition

General Information

E-ISSN: 2972-368X
Abbreviated Title: Int. J. Mach. Learn.
Frequency: Quaterly
DOI: 10.18178/IJML
Editor-in-Chief: Dr. Lin Huang
Executive Editor: Ms. Cherry L. Chen
Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library, CNKI.
E-mail: ijml@ejournal.net

Home

About IJML

Editorial Board

Author Guideline

Editor Guideline

Reviewer Guideline

Special Issues

Archive

Home > Archive > 2019 > Volume 9 Number 4 (Aug. 2019) >

Deep Learning for Vietnamese Sign Language Recognition in Video Sequence

General Information

Article Metrics in Dimensions