Real-time Gesture Recognition System using Mediapipe and LSTM Neural Networks

Authors

Siddharth Kashyap
Suraj Saxena
Sumit Gautam
Latika Sharma
Anurag Gupta
Bhoopendra Kumar

Keywords:

Gesture recognition, Deep learning, Hand Tracking, LSTM, Real Time Processing

Abstract

This research presents an innovative real-time gesture recognition system leveraging a combination of computer vision techniques and deep learning models. The system's primary objective is to accurately interpret hand gestures captured by a camera in real-time and translate them into meaningful commands or actions. The methodology adopted involves the integration of MediaPipe Hands, a robust hand detection, and tracking framework. Following hand detection, the system performs feature extraction using KeyPoint analysis, capturing essential spatial information about hand movements. The extracted features are then fed into a deep learning model, specifically a Long Short-Term Memory (LSTM) network, renowned for its ability to capture temporal dependencies in sequential data. This LSTM-based model is trained to classify the detected hand gestures into predefined action categories. Through extensive training on diverse datasets, the model learns to recognize a wide range of hand gestures with remarkable accuracy and robustness. The developed system offers versatile applications across various domains, including sign language translation, human-computer interaction, and virtual reality interfaces. Its real-time capabilities enable seamless interaction between users and devices, facilitating intuitive and natural communication channels. Experimental evaluations conducted on real-world datasets demonstrate the effectiveness and efficiency of the proposed approach, showcasing its potential for widespread adoption in practical scenarios

Downloads

Download data is not yet available.

References

N. K. Bhagat, Y. Vishnusai and G. N. Rathna, "Indian Sign Language Gesture Recognition using Image Processing and Deep Learning," 2019 Digital Image Computing: Techniques and Applications (DICTA), Perth, WA, Australia, 2019, pp. 1-8, doi: 10.1109/DICTA47822.2019.8945850.

B. Natarajan et al., "Development of an End-to-End Deep Learning Framework for Sign Language Recognition, Translation, and Video Generation," in IEEE Access, vol. 10, pp. 104358-104374, 2022, doi: 10.1109/ACCESS.2022.3210543.

T. M. Reddy, S. Abhishek, V. V. Kalyan, P. R. Varma and S. Sanapala, "Sign Language Recognition Using OpenCV and Convolutional Neural Networks," 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), Chennai, India, 2023, pp. 1-6, doi: 10.1109/RMKMATE59243.2023.10369046.

Rasha Amer Kadhim, Muntadher Khamees ,A Real-Time American Sign Language Recognition System using Convolutional Neural Network for Real Datasets TEM Journal. Volume 9, Issue 3, Pages 937-943, ISSN 2217-8309, DOI: 10.18421/TEM93-14, August 2020.

G. A. Rao, K. Syamala, P. V. V. Kishore and A. S. C. S. Sastry, "Deep convolutional neural networks for sign language recognition," 2018 Conference on Signal Processing and Communication Engineering Systems (SPACES), Vijayawada, India, 2018, pp. 194-197, doi: 10.1109/SPACES.2018.8316344.

Thakur, Amrita, et al. "Real time sign language recognition and speech generation." Journal of Innovative Image Processing 2.2 (2020): 65-76.

Dong, C., Leu, M. C., & Yin, Z. (2015). American sign language alphabet recognition using Microsoft kinect. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 44-52).

Singha, J., & Das, K. (2013). Hand gesture recognition based on Karhunen-Loeve transform. arXiv preprint arXiv:1306.2599.

Kang, B., Tripathi, S., & Nguyen, T. Q. (2015, November). Real-time sign language fingerspelling recognition using convolutional neural networks from depth map. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (pp. 136-140). IEEE.

C.K.M. Lee, Kam K.H. Ng, Chun-Hsien Chen, H.C.W. Lau, S.Y. Chung and Tiffany Tsoi, "American sign language recognition and training method with recurrent neural network", November 2020.

Pei Xu, "A real time hand gesture recognition and human-computer interaction system", Proceeding of the Computer Vision and Pattern Recognition, 2017.

V. Bhavana, G. M. Surya Mouli and G. V. Lakshmi Lokesh, "Hand Gesture Recognition Using Otsu’s Method", 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1-4, 2017

K. O. Rodriguez and G. C. Chavez, "Finger Spelling Recognition from RGB-D Information Using Kernel Descriptor", 2013 XXVI Conference on Graphics Patterns and Images, 2013

Mukul Singh Kushwah, Manish Sharma, Kunal Jain and Anish Chopra, "Sign language interpretation using pseudo glove", Proceeding of International Conference on Intelligent Communication Control and Devices, pp. 9-18, 2017.

Yann LeCun, Yoshua Bengio and Geoffrey Hinton, Deep learning. Nature, vol. 521, no. 7553, pp. 436-444, 2015.

Alex Krizhevsky, Ilya Sutskever and E Hinton Geoffrey, "Imagenet classification with deep convolutional neural networks", Advances in neural information processing systems, pp. 1097-1105, 2012.

Yann LeCun, Yoshua Bengio and Geoffrey Hinton, Deep learning. Nature, vol. 521, no. 7553, pp. 436-444, 2015.

Becky Sue Parton, "Sign language recognition and translation: A multidisciplined approach from the field of artificial intelligence", Journal of deaf studies and deaf education, vol. 11, no. 1, pp. 94-101, 2005.

Canzler, U., Dziurzyk, T.: Extraction of non-manual features for videobased sign language recognition. In: Proceedings of the IAPR Workshop on Machine Vision Applications, pp. 318–321. Nara, Japan (2002)

S. Tamura and S. Kawasaki, Recognition of sign language motion images, In Pattern Recognition, volume 21, pages 343-353, 1988

J. Ma, W. Gao, C. Wang, and J. Wu, A continuous Chinese Sign Language recognition system, International Conference on Automatic Face and Gesture Recognition, pages 428-433, 2000

N. Tanibata, N. Shimada, and Y. Shirai, Extraction of hand features for recognition of sign language words, In Proc. Intl Conf. Vision Interface, pages 391-398, 2002.

Yann LeCun, Yoshua Bengio and Geoffrey Hinton, Deep learning. Nature, vol. 521, no. 7553, pp. 436-444, 2015.

Downloads

Published

2025-05-15

How to Cite

Kashyap S, Saxena S, Gautam S, Sharma L, Gupta A, Kumar B. Real-time Gesture Recognition System using Mediapipe and LSTM Neural Networks. J Neonatal Surg [Internet]. 2025 May 15 [cited 2026 Feb. 9];14(24S):72-81. Available from: https://jneonatalsurg.com/index.php/jns/article/view/5894

Download Citation

Issue

Vol. 14 No. 24S (2025): Journal of Neonatal Surgery

Section

Original Article

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Terms:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Real-time Gesture Recognition System using Mediapipe and LSTM Neural Networks

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

You are free to:

Information

Make a Submission