Indian Sign Language Translation

Indian Sign Language (ISL) translation bridges the gap between the hearing-impaired community and those who don’t know sign.
Our system uses OpenPose to extract body‐ and hand-keypoints from video frames, then feeds the time-series of those points into an LSTM network to produce text translations in real time.

We preprocess each frame’s keypoints into a flat vector and stack them across time to form the network input. The LSTM learns to map these motion patterns to text tokens—for example, translating the sign for “HELLO,” “THANK YOU,” or “WHERE” in real time.

Diagram of the end-to-end pipeline: video frames → keypoint extraction → sequence model → final translation.

Model Architecture: OpenPose → LSTM → Text

Above: successive video frames with body and hand keypoints overlaid. These are the inputs to our LSTM time-series network.

Key Statistics for the dataset is as follows

| Characteristic       |  Details  |
| :------------------- | :-------: |
| Categories           |    15     |
| Words                |    263    |
| Videos               |   4292    |
| Avg Videos per Class |   16.3    |
| Avg Video Length     |   2.57s   |
| Min Video Length     |   1.28s   |
| Max Video Length     |   6.16s   |
| Frame Rate           |  25 fps   |
| Resolution           | 1920x1080 |

Size of each category

Category	Number of Classes	Number of Videos
Adjectives	59	791
Animals	8	166
Clothes	10	198
Colours	11	222
Days and Time	22	306
Greetings	9	185
Means of Transport	9	186
Objects at Home	27	379
Occupations	16	225
People	26	513
Places	19	399
Pronouns	8	168
Seasons	6	85
Society	23	324
	Categories# 263	Total Videos-4287

Model Structure

    translation_model = Sequential()
    translation_model.add(Input(shape=((20, 156))))
    translation_model.add(keras.layers.Masking(mask_value=0.))
    translation_model.add(BatchNormalization())
    translation_model.add(Bidirectional(LSTM(32, recurrent_dropout=0.2, return_sequences=True)))

    translation_model.add(Dropout(0.2))
    translation_model.add(Bidirectional(LSTM(32, recurrent_dropout=0.2)))

    translation_model.add(keras.layers.Activation('elu'))
    translation_model.add(Dense(32, use_bias=False, kernel_initializer='he_normal'))

    translation_model.add(BatchNormalization())
    translation_model.add(Dropout(0.2))
    translation_model.add(keras.layers.Activation('elu'))
    translation_model.add(Dense(32, kernel_initializer='he_normal',use_bias=False))

    translation_model.add(BatchNormalization())
    translation_model.add(keras.layers.Activation('elu'))
    translation_model.add(Dropout(0.2))
    translation_model.add(Dense(len(list(expression_mapping.keys())), activation='softmax'))
    isl_translator=ISLSignPosTranslator(bodypose_25_model(),handpose_model(), translation_model)

Total params: 82,679 (322.96 KB) Trainable params: 82,239 (321.25 KB) Non-trainable params: 440 (1.72 KB)

Videos processed per category

Videos count per label

Tensorboard training stats

🔗 Tensorboard

Features

Real-time inference on live webcam or uploaded videos
Works offline once the model is loaded in the browser via TensorFlow.js
Custom vocabulary: you can retrain on your own signs by supplying new CSV keypoint trajectories

Future Expansion

We need to use Transformer instead of LSTM to avoid exploding and diminishing gradient problems
This model can be enhanced further using more data, Door Darshan has huge dataset of news published for sign language translation, we are planning to request the same from Govt of India