Real-time Gesture Recognition System using Mediapipe and LSTM Neural Networks

Authors

  • Siddharth Kashyap
  • Suraj Saxena
  • Sumit Gautam
  • Latika Sharma
  • Anurag Gupta
  • Bhoopendra Kumar

Keywords:

Gesture recognition, Deep learning, Hand Tracking, LSTM, Real Time Processing

Abstract

This research presents an innovative real-time gesture recognition system leveraging a combination of computer vision techniques and deep learning models. The system's primary objective is to accurately interpret hand gestures captured by a camera in real-time and translate them into meaningful commands or actions. The methodology adopted involves the integration of MediaPipe Hands, a robust hand detection, and tracking framework. Following hand detection, the system performs feature extraction using KeyPoint analysis, capturing essential spatial information about hand movements. The extracted features are then fed into a deep learning model, specifically a Long Short-Term Memory (LSTM) network, renowned for its ability to capture temporal dependencies in sequential data. This LSTM-based model is trained to classify the detected hand gestures into predefined action categories. Through extensive training on diverse datasets, the model learns to recognize a wide range of hand gestures with remarkable accuracy and robustness. The developed system offers versatile applications across various domains, including sign language translation, human-computer interaction, and virtual reality interfaces. Its real-time capabilities enable seamless interaction between users and devices, facilitating intuitive and natural communication channels. Experimental evaluations conducted on real-world datasets demonstrate the effectiveness and efficiency of the proposed approach, showcasing its potential for widespread adoption in practical scenarios

Downloads

Download data is not yet available.

References

N. K. Bhagat, Y. Vishnusai and G. N. Rathna, "Indian Sign Language Gesture Recognition using Image Processing and Deep Learning," 2019 Digital Image Computing: Techniques and Applications (DICTA), Perth, WA, Australia, 2019, pp. 1-8, doi: 10.1109/DICTA47822.2019.8945850.

B. Natarajan et al., "Development of an End-to-End Deep Learning Framework for Sign Language Recognition, Translation, and Video Generation," in IEEE Access, vol. 10, pp. 104358-104374, 2022, doi: 10.1109/ACCESS.2022.3210543.

T. M. Reddy, S. Abhishek, V. V. Kalyan, P. R. Varma and S. Sanapala, "Sign Language Recognition Using OpenCV and Convolutional Neural Networks," 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), Chennai, India, 2023, pp. 1-6, doi: 10.1109/RMKMATE59243.2023.10369046.

Rasha Amer Kadhim, Muntadher Khamees ,A Real-Time American Sign Language Recognition System using Convolutional Neural Network for Real Datasets TEM Journal. Volume 9, Issue 3, Pages 937-943, ISSN 2217-8309, DOI: 10.18421/TEM93-14, August 2020.

G. A. Rao, K. Syamala, P. V. V. Kishore and A. S. C. S. Sastry, "Deep convolutional neural networks for sign language recognition," 2018 Conference on Signal Processing and Communication Engineering Systems (SPACES), Vijayawada, India, 2018, pp. 194-197, doi: 10.1109/SPACES.2018.8316344.

Thakur, Amrita, et al. "Real time sign language recognition and speech generation." Journal of Innovative Image Processing 2.2 (2020): 65-76.

Dong, C., Leu, M. C., & Yin, Z. (2015). American sign language alphabet recognition using Microsoft kinect. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 44-52).

Singha, J., & Das, K. (2013). Hand gesture recognition based on Karhunen-Loeve transform. arXiv preprint arXiv:1306.2599.

Kang, B., Tripathi, S., & Nguyen, T. Q. (2015, November). Real-time sign language fingerspelling recognition using convolutional neural networks from depth map. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (pp. 136-140). IEEE.

C.K.M. Lee, Kam K.H. Ng, Chun-Hsien Chen, H.C.W. Lau, S.Y. Chung and Tiffany Tsoi, "American sign language recognition and training method with recurrent neural network", November 2020.

Pei Xu, "A real time hand gesture recognition and human-computer interaction system", Proceeding of the Computer Vision and Pattern Recognition, 2017.

V. Bhavana, G. M. Surya Mouli and G. V. Lakshmi Lokesh, "Hand Gesture Recognition Using Otsu’s Method", 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1-4, 2017

K. O. Rodriguez and G. C. Chavez, "Finger Spelling Recognition from RGB-D Information Using Kernel Descriptor", 2013 XXVI Conference on Graphics Patterns and Images, 2013

Mukul Singh Kushwah, Manish Sharma, Kunal Jain and Anish Chopra, "Sign language interpretation using pseudo glove", Proceeding of International Conference on Intelligent Communication Control and Devices, pp. 9-18, 2017.

Yann LeCun, Yoshua Bengio and Geoffrey Hinton, Deep learning. Nature, vol. 521, no. 7553, pp. 436-444, 2015.

Alex Krizhevsky, Ilya Sutskever and E Hinton Geoffrey, "Imagenet classification with deep convolutional neural networks", Advances in neural information processing systems, pp. 1097-1105, 2012.

Yann LeCun, Yoshua Bengio and Geoffrey Hinton, Deep learning. Nature, vol. 521, no. 7553, pp. 436-444, 2015.

Becky Sue Parton, "Sign language recognition and translation: A multidisciplined approach from the field of artificial intelligence", Journal of deaf studies and deaf education, vol. 11, no. 1, pp. 94-101, 2005.

Canzler, U., Dziurzyk, T.: Extraction of non-manual features for videobased sign language recognition. In: Proceedings of the IAPR Workshop on Machine Vision Applications, pp. 318–321. Nara, Japan (2002)

S. Tamura and S. Kawasaki, Recognition of sign language motion images, In Pattern Recognition, volume 21, pages 343-353, 1988

J. Ma, W. Gao, C. Wang, and J. Wu, A continuous Chinese Sign Language recognition system, International Conference on Automatic Face and Gesture Recognition, pages 428-433, 2000

N. Tanibata, N. Shimada, and Y. Shirai, Extraction of hand features for recognition of sign language words, In Proc. Intl Conf. Vision Interface, pages 391-398, 2002.

Yann LeCun, Yoshua Bengio and Geoffrey Hinton, Deep learning. Nature, vol. 521, no. 7553, pp. 436-444, 2015.

Downloads

Published

2025-05-15

How to Cite

1.
Kashyap S, Saxena S, Gautam S, Sharma L, Gupta A, Kumar B. Real-time Gesture Recognition System using Mediapipe and LSTM Neural Networks. J Neonatal Surg [Internet]. 2025May15 [cited 2025Sep.21];14(24S):72-81. Available from: https://jneonatalsurg.com/index.php/jns/article/view/5894

Similar Articles

You may also start an advanced similarity search for this article.