Gesture_ Recognition (gesture recognition)
1, Project information
The project is developed based on tensorflow 2.1, self built six gesture data sets, built-in image preprocessing, image widening, network model, recognition and other programs. It can help novices master how to use self-made data sets for model training, classification, and call the trained model for target detection.
Because the computer GPU resources are limited, only 2400 images are trained, and the gray binary image processing method is adopted. When saving the dataset, I first save the augmented image, and then read the image and save it as NumPy. As a result, the binary image with two dimensions still has three dimensions. You can modify the code to reduce the image dimension and reduce the amount of calculation.
All data sets are self-made data sets, which are only for learning and are not allowed to be used for commercial purposes.
2, Software environment
|Software library and environment||Version information|
The IDE used for development is the Spyder provided by anaconda. During the development of the project, Anaconda can be used to create a virtual environment, create a python 3.7 environment, and install versions above tensorflow 2. X. tensorflow 2. X only supports versions above Python 3.7. cuda and cudnn need to be installed according to their own computer configuration. The installation process is too cumbersome. Come on!
3, Project structure
(1) File directory
+---img_augment # Augmented dataset +---img_binary # Gray binary data set +---img_gray # Gray data set +---img_hsv # HSV dataset +---img_main # Raw data set +---weight # Saved model file +---alexNet.py # Training model +---demo_cv.py # Call the model for identification detection +---img_aug.py # Data augmentation +---img_augment_rename.py # Augmented dataset naming override +---img_gray.py # Convert to grayscale image and save in img_gray +---img_hsv.py # Convert to HSV and save in img_hsv +---img_main_rename.py #Original dataset naming override +---train_test.py # The dataset is converted to numpy format and saved \---acc.png # Model training acc chart
(2) Network model structure
Model: "functional_9" # alexnet network structure _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_5 (InputLayer) [(None, 224, 224, 3)] 0 _________________________________________________________________ zero_padding2d_4 (ZeroPaddin (None, 227, 227, 3) 0 _________________________________________________________________ conv2d_20 (Conv2D) (None, 55, 55, 48) 17472 _________________________________________________________________ max_pooling2d_12 (MaxPooling (None, 27, 27, 48) 0 _________________________________________________________________ conv2d_21 (Conv2D) (None, 27, 27, 128) 153728 _________________________________________________________________ max_pooling2d_13 (MaxPooling (None, 13, 13, 128) 0 _________________________________________________________________ conv2d_22 (Conv2D) (None, 13, 13, 192) 221376 _________________________________________________________________ conv2d_23 (Conv2D) (None, 13, 13, 192) 331968 _________________________________________________________________ conv2d_24 (Conv2D) (None, 13, 13, 128) 221312 _________________________________________________________________ max_pooling2d_14 (MaxPooling (None, 6, 6, 128) 0 _________________________________________________________________ flatten_4 (Flatten) (None, 4608) 0 _________________________________________________________________ dropout_8 (Dropout) (None, 4608) 0 _________________________________________________________________ dense_12 (Dense) (None, 2048) 9439232 _________________________________________________________________ dropout_9 (Dropout) (None, 2048) 0 _________________________________________________________________ dense_13 (Dense) (None, 1024) 2098176 _________________________________________________________________ dense_14 (Dense) (None, 6) 6150 _________________________________________________________________ softmax_4 (Softmax) (None, 6) 0 ================================================================= Total params: 12,489,414 Trainable params: 12,489,414 Non-trainable params: 0 _________________________________________________________________
(3) Training accuracy and prediction accuracy
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-dn4ruvi8-1635360547679) (read. Assets / acc_. PNG)]
4, Program description
Before the project starts, please establish it
img_gray,img_binary,img_hsv,img_ Fragment and weight folders.
If you need to regenerate a new picture, remember to delete all the pictures in the folder without deleting the original data set.
Firstly, the original data set is expanded, then gray binarization or hsv processing is carried out, and finally the feature array and label array are generated. The neural network is run for model training, and the model is called for result prediction.
The program only modifies the file name of the original data set, which is convenient for viewing and reading. Readers can take their own data sets. Remember to normalize them. Please refer to the demo_ For the image normalization processing in CV file, the normalization processing program is not given in this project.
Expand the data set and save the expanded data set in img_ In the fragment folder.
The program only modifies the file name of the augmented dataset.
After the original data set is ready, you can run the function, convert it to a grayscale image and save it in img_gray folder (or modify the file path and save the binary image). In my program, I directly convert the augmented data set into gray image, and then save it in img directly after binarization_ Binary folder, please pay attention.
Convert to grayscale image and save in img_ In the HSV folder.
img_binary dataset is converted to NumPy format and saved.
Use Alex net neural network to train the model and save the model in weight.
Call the model and call the camera to transfer the image information into the model to obtain the prediction result.
5, Model evaluation
Due to the small amount of training data and serious over fitting, the model has low recognition accuracy in complex environment. The image can be binarized by HSV. In this project, only the HSV diagram is saved. Due to GPU resources, there is no model comparison for the image after training HSV binarization. Readers can try by themselves, or change the network structure to increase the number of data sets, Improve the accuracy of the model in complex environment. But for novices, learning is enough.
GitHub address: https://github.com/AeneonLXC/pythonProject2/tree/master/gesture_recognition/gesture