Gesture_ Cognition (gesture recognition) [attach github address]

Gesture_ Recognition (gesture recognition)

1, Project information

The project is developed based on tensorflow 2.1, self built six gesture data sets, built-in image preprocessing, image widening, network model, recognition and other programs. It can help novices master how to use self-made data sets for model training, classification, and call the trained model for target detection.

Because the computer GPU resources are limited, only 2400 images are trained, and the gray binary image processing method is adopted. When saving the dataset, I first save the augmented image, and then read the image and save it as NumPy. As a result, the binary image with two dimensions still has three dimensions. You can modify the code to reduce the image dimension and reduce the amount of calculation.

All data sets are self-made data sets, which are only for learning and are not allowed to be used for commercial purposes.

2, Software environment

Software library and environmentVersion information
Graphics cardGTX1650

The IDE used for development is the Spyder provided by anaconda. During the development of the project, Anaconda can be used to create a virtual environment, create a python 3.7 environment, and install versions above tensorflow 2. X. tensorflow 2. X only supports versions above Python 3.7. cuda and cudnn need to be installed according to their own computer configuration. The installation process is too cumbersome. Come on!

3, Project structure

(1) File directory

+---img_augment # Augmented dataset
+---img_binary  # Gray binary data set
+---img_gray    # Gray data set
+---img_hsv     # HSV dataset
+---img_main    # Raw data set
+---weight	    # Saved model file  # Training model  # Call the model for identification detection  # Data augmentation  # Augmented dataset naming override # Convert to grayscale image and save in img_gray  # Convert to HSV and save in img_hsv  #Original dataset naming override  # The dataset is converted to numpy format and saved
\---acc.png  # Model training acc chart

(2) Network model structure

Model: "functional_9" # alexnet network structure
Layer (type)                 Output Shape              Param #   
input_5 (InputLayer)         [(None, 224, 224, 3)]     0         
zero_padding2d_4 (ZeroPaddin (None, 227, 227, 3)       0         
conv2d_20 (Conv2D)           (None, 55, 55, 48)        17472     
max_pooling2d_12 (MaxPooling (None, 27, 27, 48)        0         
conv2d_21 (Conv2D)           (None, 27, 27, 128)       153728    
max_pooling2d_13 (MaxPooling (None, 13, 13, 128)       0         
conv2d_22 (Conv2D)           (None, 13, 13, 192)       221376    
conv2d_23 (Conv2D)           (None, 13, 13, 192)       331968    
conv2d_24 (Conv2D)           (None, 13, 13, 128)       221312    
max_pooling2d_14 (MaxPooling (None, 6, 6, 128)         0         
flatten_4 (Flatten)          (None, 4608)              0         
dropout_8 (Dropout)          (None, 4608)              0         
dense_12 (Dense)             (None, 2048)              9439232   
dropout_9 (Dropout)          (None, 2048)              0         
dense_13 (Dense)             (None, 1024)              2098176   
dense_14 (Dense)             (None, 6)                 6150      
softmax_4 (Softmax)          (None, 6)                 0         
Total params: 12,489,414
Trainable params: 12,489,414
Non-trainable params: 0

(3) Training accuracy and prediction accuracy

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-dn4ruvi8-1635360547679) (read. Assets / acc_. PNG)]

4, Program description

Before the project starts, please establish it

‚Äč img_gray,img_binary,img_hsv,img_ Fragment and weight folders.

If you need to regenerate a new picture, remember to delete all the pictures in the folder without deleting the original data set.

Firstly, the original data set is expanded, then gray binarization or hsv processing is carried out, and finally the feature array and label array are generated. The neural network is run for model training, and the model is called for result prediction.


The program only modifies the file name of the original data set, which is convenient for viewing and reading. Readers can take their own data sets. Remember to normalize them. Please refer to the demo_ For the image normalization processing in CV file, the normalization processing program is not given in this project.


Expand the data set and save the expanded data set in img_ In the fragment folder.


The program only modifies the file name of the augmented dataset.


After the original data set is ready, you can run the function, convert it to a grayscale image and save it in img_gray folder (or modify the file path and save the binary image). In my program, I directly convert the augmented data set into gray image, and then save it in img directly after binarization_ Binary folder, please pay attention.


Convert to grayscale image and save in img_ In the HSV folder.


img_binary dataset is converted to NumPy format and saved.


Use Alex net neural network to train the model and save the model in weight.


Call the model and call the camera to transfer the image information into the model to obtain the prediction result.

5, Model evaluation

Due to the small amount of training data and serious over fitting, the model has low recognition accuracy in complex environment. The image can be binarized by HSV. In this project, only the HSV diagram is saved. Due to GPU resources, there is no model comparison for the image after training HSV binarization. Readers can try by themselves, or change the network structure to increase the number of data sets, Improve the accuracy of the model in complex environment. But for novices, learning is enough.

gitee address:

GitHub address:

Keywords: Python github TensorFlow Deep Learning

Added by lkalik on Wed, 27 Oct 2021 21:06:57 +0300