This project focuses on developing a sign language recognition model using deep learning techniques. It integrates Mediapipe for feature extraction and LSTM-based neural networks to classify gestures based on video input. The model aims to enhance accessibility by enabling real-time recognition of sign language.
-
Data Collection & Processing:
- The notebook starts by installing necessary dependencies and importing key libraries, including Mediapipe, OpenCV, NumPy, and PyTorch.
- It extracts hand, pose, and face landmarks from sign language videos and stores them as structured numerical data.
-
Feature Extraction & Storage:
- Each frame is processed using Mediapipe Holistic Model, detecting key landmarks.
- Extracted landmarks are saved as NumPy (.npy) files, creating a dataset for training.
-
Model Development:
- The dataset is split into training and testing sets.
- A multi-layer LSTM model is implemented using PyTorch, consisting of stacked LSTMs and fully connected layers.
- Optimization techniques such as dropout, layer normalization, and learning rate scheduling are applied to improve performance.
-
Training & Evaluation:
- The model is trained on gesture sequences with cross-entropy loss and SGD optimizer with momentum.
- Early stopping prevents overfitting by monitoring validation loss.
- The trained model is evaluated using accuracy metrics, ensuring high reliability in recognizing sign gestures.
-
Prediction & Deployment:
- The model can infer sign language gestures from new video inputs.
- Future improvements include real-time inference integration and deployment via API or mobile applications.
This notebook presents a structured pipeline for sign language recognition, leveraging deep learning and computer vision. The combination of Mediapipe for feature extraction and LSTM for sequence modeling makes it effective for gesture classification. Further refinements can enhance real-time recognition performance, making it a valuable tool for accessibility solutions.