TensorFlow by Google CNN machine learning foundations: EP #5 - classifying real world images

The picture classification structure is convenient for training and testing

1. Use convolution for complex images


#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# https://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.

In the previous experiment, you used the Fashion MNIST dataset to train the image classifier. In this case, your image is 28x28 with the theme centered. In this experiment, you will take it to a new level and train to recognize the features in the image, in which the subject can be located anywhere in the image!

You will do this by building a horse or man classifier that tells you whether a given image contains a horse or man, where the network is trained to identify features that determine which is which.

In the case of Fashion MNIST, the data is built into TensorFlow through Keras. In this case, the data is not so, so you must process it before you can train.

First, let's download the data:

!wget --no-check-certificate \
    https://storage.googleapis.com/laurencemoroney-blog.appspot.com/horse-or-human.zip \
    -O /tmp/horse-or-human.zip

!wget --no-check-certificate \
    https://storage.googleapis.com/laurencemoroney-blog.appspot.com/validation-horse-or-human.zip \
    -O /tmp/validation-horse-or-human.zip
--2021-09-08 23:03:26--  https://storage.googleapis.com/laurencemoroney-blog.appspot.com/horse-or-human.zip
Resolving storage.googleapis.com (storage.googleapis.com)...,,, ...
Connecting to storage.googleapis.com (storage.googleapis.com)||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149574867 (143M) [application/zip]
Saving to: '/tmp/horse-or-human.zip'

/tmp/horse-or-human 100%[===================>] 142.65M   139MB/s    in 1.0s    

2021-09-08 23:03:27 (139 MB/s) - '/tmp/horse-or-human.zip' saved [149574867/149574867]

--2021-09-08 23:03:27--  https://storage.googleapis.com/laurencemoroney-blog.appspot.com/validation-horse-or-human.zip
Resolving storage.googleapis.com (storage.googleapis.com)...,,, ...
Connecting to storage.googleapis.com (storage.googleapis.com)||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11480187 (11M) [application/zip]
Saving to: '/tmp/validation-horse-or-human.zip'

/tmp/validation-hor 100%[===================>]  10.95M  32.0MB/s    in 0.3s    

2021-09-08 23:03:28 (32.0 MB/s) - '/tmp/validation-horse-or-human.zip' saved [11480187/11480187]

The following python code will use the OS library to use the operating system library so that you can access the file system and the zipfile library so that you can decompress the data.

import os
import zipfile

local_zip = '/tmp/horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
local_zip = '/tmp/validation-horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')

The contents of. zip are extracted into the base directory / TMP / horse or human, which in turn contains the horses and humans subdirectories, respectively.

In short: the training set is used to tell the neural network model "this is what a horse looks like", "this is what a man looks like" and other data.

One thing to note in this example is that we don't explicitly mark the image as a horse or man. If you remember the previous fashion example, we have marked "this is 1", "this is 7", etc.

You'll see later that something called an ImageGenerator is used -- it's encoded to read images from a subdirectory and automatically mark them from the name of that subdirectory. So, for example, you will have a "training" directory that contains a "horses" directory and a "humans" directory. The ImageGenerator marks the image appropriately for you, reducing the encoding steps.

Let's define each directory:

# Directory with our training horse pictures
train_horse_dir = os.path.join('/tmp/horse-or-human/horses')

# Directory with our training human pictures
train_human_dir = os.path.join('/tmp/horse-or-human/humans')

# Directory with our training horse pictures
validation_horse_dir = os.path.join('/tmp/validation-horse-or-human/horses')

# Directory with our training human pictures
validation_human_dir = os.path.join('/tmp/validation-horse-or-human/humans')

Now let's look at the file names in the horses and human training directories:

train_horse_names = os.listdir(train_horse_dir)

train_human_names = os.listdir(train_human_dir)
['horse42-9.png', 'horse15-4.png', 'horse43-0.png', 'horse40-1.png', 'horse16-0.png', 'horse15-5.png', 'horse09-0.png', 'horse02-7.png', 'horse47-0.png', 'horse32-7.png']
['human02-27.png', 'human14-25.png', 'human16-25.png', 'human09-04.png', 'human06-21.png', 'human05-29.png', 'human10-01.png', 'human08-13.png', 'human05-18.png', 'human17-18.png']

Let's find the total number of horse and man images in the catalog:

print('total training horse images:', len(os.listdir(train_horse_dir)))
print('total training human images:', len(os.listdir(train_human_dir)))
total training horse images: 500
total training human images: 527

Now let's look at some pictures to better understand their appearance. First, configure the matplot parameter:

%matplotlib inline

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# Parameters for our graph; we'll output images in a 4x4 configuration
nrows = 4
ncols = 4

# Index for iterating over images
pic_index = 0

Now, a batch of 8 horses and 8 people are shown. You can rerun the cells to view the new batch each time:

# Set up matplotlib fig, and size it to fit 4x4 pics
fig = plt.gcf()
fig.set_size_inches(ncols * 4, nrows * 4)

pic_index += 8
next_horse_pix = [os.path.join(train_horse_dir, fname) 
                for fname in train_horse_names[pic_index-8:pic_index]]
next_human_pix = [os.path.join(train_human_dir, fname) 
                for fname in train_human_names[pic_index-8:pic_index]]

for i, img_path in enumerate(next_horse_pix+next_human_pix):
  # Set up subplot; subplot indices start at 1
  sp = plt.subplot(nrows, ncols, i + 1)
  sp.axis('Off') # Don't show axes (or gridlines)

  img = mpimg.imread(img_path)


1.1 build a small model from scratch

But before we continue, let's start defining the model:

Step 1 is to import tensorflow.

  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
import tensorflow as tf

Then we add the convolution layer as in the previous example and flatten the final result to feed the densely connected layers.

Finally, we add densely connected layers.

Please note that because we are facing a binary classification problem, that is, binary classification problem, we will sigmoid activation End our network, so the output of our network will be a single scalar between 0 and 1, and the coding probability of the current image is class 1 (opposite to class 0).

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', 
                           input_shape=(300, 300, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')

The model.summary() method calls to print a summary of the neural network

Model: "sequential"
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 298, 298, 16)      448       
max_pooling2d (MaxPooling2D) (None, 149, 149, 16)      0         
conv2d_1 (Conv2D)            (None, 147, 147, 32)      4640      
max_pooling2d_1 (MaxPooling2 (None, 73, 73, 32)        0         
conv2d_2 (Conv2D)            (None, 71, 71, 64)        18496     
max_pooling2d_2 (MaxPooling2 (None, 35, 35, 64)        0         
flatten (Flatten)            (None, 78400)             0         
dense (Dense)                (None, 512)               40141312  
dense_1 (Dense)              (None, 1)                 513       
Total params: 40,165,409
Trainable params: 40,165,409
Non-trainable params: 0

The output shape column shows how the size of your feature map evolves in each successive layer. Due to filling, the convolution layer reduces the size of the characteristic graph a little, and the size of each pool layer is halved.

Next, we will configure the specification of model training. We'll use binary_crossentropy loss trains our model because it is a binary classification problem, and our final activation is a sigmoid. (for a review of loss indicators, see Machine learning rmsprop crash course ) we will use an optimizer with a learning rate of 0.001. During the training, we will monitor the classification accuracy.

Note: in this case, use RMSprop optimization algorithm than Random gradient descent (SGD) It is preferable because RMSprop automatically adjusts the learning rate for us. (other optimizers, such as Adam and Adagrad , it will also automatically adjust the learning rate during training, and it can work normally here.)

from tensorflow.keras.optimizers import RMSprop

/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/optimizer_v2.py:356: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  "The `lr` argument is deprecated, use `learning_rate` instead.")

1.2 data preprocessing

Let's set up a data generator that will read the pictures in our source folder, convert them into float32 tensors, and provide them (with their labels) to our network. We will have a generator for training images and a generator for verifying images. Our generator will generate batch images with a size of 300x300 and their labels (binary).

You may already know that the data entering the neural network should usually be standardized in some way to make it easier for the network to process. (it is not common to input the original pixels into convnet.) in our example, we will preprocess our image by normalizing the pixel values in the range of [0,1] (initially all values are in the range of [0,255]).

In Keras, this can be done through the class of keras.preprocessing.image.ImageDataGenerator using the rescale parameter. This type of ImageDataGenerator allows you to use. flow(data, labels) or instantiate a generator. Flow for enhancing image batches (and their labels)_ from_ directory(directory). These generators can then be used in the Keras model method with accepted data generation as input: fit_generator´╝îevaluate_generator, and predict_generator.

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1/255)

# Flow training images in batches of 128 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
        '/tmp/horse-or-human/',  # This is the source directory for training images
        target_size=(300, 300),  # All images will be resized to 150x150
        # Since we use binary_crossentropy loss, we need binary labels

validation_datagen = ImageDataGenerator(rescale=1/255)

# Flow training images in batches of 128 using train_datagen generator
validation_generator = validation_datagen.flow_from_directory(
        '/tmp/validation-horse-or-human/',  # This is the source directory for training images
        target_size=(300, 300),  # All images will be resized to 150x150
        # Since we use binary_crossentropy loss, we need binary labels

Found 1027 images belonging to 2 classes.
Found 256 images belonging to 2 classes.

1.3 training

Let's train for 15 periods - this may take a few minutes to run.

Note the value for each period.

Loss and accuracy are good indicators of training progress. It guesses the classification of training data, then measures it according to the known labels and calculates the results. Accuracy is part of the right guess.

history = model.fit(
      validation_data = validation_generator,  
Epoch 1/15
8/8 [==============================] - 41s 1s/step - loss: 5.8897 - accuracy: 0.4705 - val_loss: 0.5922 - val_accuracy: 0.7539
Epoch 2/15
8/8 [==============================] - 8s 1s/step - loss: 0.6112 - accuracy: 0.6885 - val_loss: 1.1099 - val_accuracy: 0.6367
Epoch 3/15
8/8 [==============================] - 8s 1s/step - loss: 0.2795 - accuracy: 0.8587 - val_loss: 3.0112 - val_accuracy: 0.6367
Epoch 4/15
8/8 [==============================] - 8s 1s/step - loss: 0.2430 - accuracy: 0.9021 - val_loss: 1.4884 - val_accuracy: 0.7734
Epoch 5/15
8/8 [==============================] - 8s 1s/step - loss: 0.0752 - accuracy: 0.9689 - val_loss: 4.5065 - val_accuracy: 0.6250
Epoch 6/15
8/8 [==============================] - 8s 1s/step - loss: 0.5074 - accuracy: 0.8576 - val_loss: 4.8991 - val_accuracy: 0.6484
Epoch 7/15
8/8 [==============================] - 10s 1s/step - loss: 0.1014 - accuracy: 0.9677 - val_loss: 1.9106 - val_accuracy: 0.7773
Epoch 8/15
8/8 [==============================] - 9s 1s/step - loss: 0.0136 - accuracy: 0.9980 - val_loss: 2.7661 - val_accuracy: 0.7578
Epoch 9/15
8/8 [==============================] - 9s 1s/step - loss: 0.3347 - accuracy: 0.8701 - val_loss: 1.4895 - val_accuracy: 0.7266
Epoch 10/15
8/8 [==============================] - 8s 1s/step - loss: 0.0438 - accuracy: 0.9878 - val_loss: 1.2067 - val_accuracy: 0.8438
Epoch 11/15
8/8 [==============================] - 8s 1s/step - loss: 0.0137 - accuracy: 0.9967 - val_loss: 2.5188 - val_accuracy: 0.8047
Epoch 12/15
8/8 [==============================] - 8s 1s/step - loss: 0.0026 - accuracy: 1.0000 - val_loss: 2.6172 - val_accuracy: 0.8047
Epoch 13/15
8/8 [==============================] - 8s 1s/step - loss: 0.0023 - accuracy: 1.0000 - val_loss: 2.7464 - val_accuracy: 0.8125
Epoch 14/15
8/8 [==============================] - 8s 1s/step - loss: 0.0159 - accuracy: 0.9933 - val_loss: 29.6021 - val_accuracy: 0.5000
Epoch 15/15
8/8 [==============================] - 9s 1s/step - loss: 3.6140 - accuracy: 0.8242 - val_loss: 1.9420 - val_accuracy: 0.7695

1.4 operation model

Now let's look at actually running the forecast using the model. This code will allow you to select one or more files from the file system, upload them, and run them in the model, indicating whether the object is a horse or a man.

You can download the image from the Internet to your file system for trial!

Please note that you may see that the network has made many mistakes, although the training accuracy is higher than 99%.

This is due to over fitting, which means that the neural network is trained with very limited data - there are only about 500 images in each category. Therefore, it is very good at identifying images similar to those in the training set, but it may fail a lot when identifying images not in the training set.

This is a data point, which proves that the more data you train, the better your final network will be!

Although the data is limited, there are still many technologies that can be used to make your training better, including a technology called image enhancement. This is beyond the scope of this laboratory!

import numpy as np
from google.colab import files
from keras.preprocessing import image

uploaded = files.upload()

for fn in uploaded.keys():
  # predicting images
  path = '/content/' + fn
  img = image.load_img(path, target_size=(300, 300))
  x = image.img_to_array(img)
  x = x / 255
  x = np.expand_dims(x, axis=0)
  images = np.vstack([x])
  classes = model.predict(images, batch_size=10)
  if classes[0]>0.5:
    print(fn + " is a human")
    print(fn + " is a horse")

You can select multiple pictures at a time

9 files
horse1.jpeg(image/jpeg) - 9533 bytes, last modified: 9/9/2021 - 100% done
horse2.jpeg(image/jpeg) - 6028 bytes, last modified: 9/9/2021 - 100% done
horse3.jpeg(image/jpeg) - 5545 bytes, last modified: 9/9/2021 - 100% done
horse4.jpeg(image/jpeg) - 7278 bytes, last modified: 9/9/2021 - 100% done
lady1.jpeg(image/jpeg) - 5549 bytes, last modified: 9/9/2021 - 100% done
lady2.jpeg(image/jpeg) - 5713 bytes, last modified: 9/9/2021 - 100% done
lady3.jpeg(image/jpeg) - 8627 bytes, last modified: 9/9/2021 - 100% done
lady4.jpeg(image/jpeg) - 6872 bytes, last modified: 9/9/2021 - 100% done
lady5.jpeg(image/jpeg) - 5807 bytes, last modified: 9/9/2021 - 100% done
Saving horse1.jpeg to horse1 (1).jpeg
Saving horse2.jpeg to horse2.jpeg
Saving horse3.jpeg to horse3.jpeg
Saving horse4.jpeg to horse4.jpeg
Saving lady1.jpeg to lady1.jpeg
Saving lady2.jpeg to lady2.jpeg
Saving lady3.jpeg to lady3.jpeg
Saving lady4.jpeg to lady4.jpeg
Saving lady5.jpeg to lady5.jpeg
horse1.jpeg is a horse
horse2.jpeg is a horse
horse3.jpeg is a horse
horse4.jpeg is a human
lady1.jpeg is a horse
lady2.jpeg is a horse
lady3.jpeg is a horse
lady4.jpeg is a human
lady5.jpeg is a horse

1.5 visual intermediate representation

In order to understand what features our convnet has learned, an interesting thing is how visual input is converted through convnet.

Let's randomly select an image from the training set, and then generate a graph. Each row is the output of a layer, and each image in the row is a specific filter in the output feature graph. Rerun this cell to generate an intermediate representation of the various training images.

import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img

# Let's define a new Model that will take an image as input, and will output
# intermediate representations for all layers in the previous model after
# the first.
successive_outputs = [layer.output for layer in model.layers[1:]]
#visualization_model = Model(img_input, successive_outputs)
visualization_model = tf.keras.models.Model(inputs = model.input, outputs = successive_outputs)
# Let's prepare a random input image from the training set.
horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]
human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]
img_path = random.choice(horse_img_files + human_img_files)

img = load_img(img_path, target_size=(300, 300))  # this is a PIL image
x = img_to_array(img)  # Numpy array with shape (150, 150, 3)
x = x.reshape((1,) + x.shape)  # Numpy array with shape (1, 150, 150, 3)

# Rescale by 1/255
x /= 255

# Let's run our image through our network, thus obtaining all
# intermediate representations for this image.
successive_feature_maps = visualization_model.predict(x)

# These are the names of the layers, so can have them as part of our plot
layer_names = [layer.name for layer in model.layers]

# Now let's display our representations
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
  if len(feature_map.shape) == 4:
    # Just do this for the conv / maxpool layers, not the fully-connected layers
    n_features = feature_map.shape[-1]  # number of features in feature map
    # The feature map has shape (1, size, size, n_features)
    size = feature_map.shape[1]
    # We will tile our images in this matrix
    display_grid = np.zeros((size, size * n_features))
    for i in range(n_features):
      # Postprocess the feature to make it visually palatable
      x = feature_map[0, :, :, i]
      x -= x.mean()
      # if x>0:
      #   x /= x.std()
      x *= 64
      x += 128
      x = np.clip(x, 0, 255).astype('uint8')
      # We'll tile each filter into this big horizontal grid
      display_grid[:, i * size : (i + 1) * size] = x
    # Display the grid
    scale = 20. / n_features
    plt.figure(figsize=(scale * n_features, scale))
    plt.imshow(display_grid, aspect='auto', cmap='viridis')

As you can see, we move from the original pixels of the image to an increasingly abstract and compact representation. Downstream representations begin to highlight the content concerned by the network, and they show that fewer and fewer features are "activated"; Most are set to zero. This is called sparsity. Representation sparsity is a key feature of deep learning.

These representations carry less and less information about the original pixels of the image, but more and more fine information about the image category. You can think of convnet (or general deep network) as an information distillation pipeline.

1.6 cleaning

Before running the next exercise, run the following cells to terminate the kernel and free up memory resources:

import os, signal
os.kill(os.getpid(), signal.SIGKILL)

reference resources


Keywords: TensorFlow Pytorch Deep Learning Convolutional Neural Networks CNN

Added by php_guest on Fri, 10 Sep 2021 03:19:16 +0300