Computer vision 2.3: image feature vector extraction and application of transfer learning

Image feature vector extraction and application of transfer learning

This article will discuss the concept of transfer learning in computer vision, an ability to use a pre trained model to learn from data sets other than its previously trained data sets.

For example:

There are two different data sets A and B, and our task is to identify different types of images in A and B (classification task)

Conventional practice: train model X on dataset A and model Y on dataset B

The method of transfer learning: train model x on dataset A, transform the trained model x, and then use X to train on dataset B.


Deep neural network X has been trained on large data sets, such as ImageNet. These trained models perform well in transfer learning. Reusing their convolution kernels is more meaningful than training new convolution kernels again.

Generally speaking, there are two types of transfer learning applied to deep learning computer vision:

  • Model X is used as the feature extractor, and then the extracted features are used as the input of other machine learning algorithms.
  • Remove the FC (full connection layer) of model X, replace it with a new FC layer, and then fine tune its weight.

This article will focus on the first type.

Feature extraction using trained CNN

So far, we have regarded convolutional neural network as an end-to-end classifier:

  1. Input images into the network
  2. Propagate the image forward through the entire network
  3. Obtain the classification probability from the end of the network

However, no one stipulates that we must let the image pass through the whole network. We can choose to stop at any layer, such as Activation or Pool layer. At this time, we take the value from the network and use it as the feature vector.

If we extract the corresponding feature vectors from the images in the whole image dataset through the above operations, and then use these extracted feature vectors to train the existing machine learning models (such as linear SVM, logistic regression classifier and random forest).

Note that in the whole process, our convolutional neural network can not complete the classification operation. We just use it as a feature extractor, and the downstream machine learning classifier is responsible for learning latent patterns from the features extracted by convolutional neural network.

Know HDF5

HDF5 is a binary data format created by HDF5 group. It is used to store huge data sets on the hard disk, and it is convenient to access and operate the data in the data set.

The data in HDF5 is stored hierarchically, which is very similar to the way the file system stores data.

  • Group: data is first defined in a group. A group is like a container. It can hold data sets and other groups.

  • Dataset: once the group is defined, the dataset can be created in the group. The dataset can be regarded as multidimensional data of the same data type.

HDF5 is written in C, but with h5py module, we can use python language to manipulate the underlying C API.

The amazing thing about HDF5 is that it interacts with data very easily. We can store a large amount of data in HDF5 dataset and manipulate it in a way similar to manipulating Numpy arrays.

When using HDF5 through h5py, you can treat your data as a huge NumPy array. This array is too large to load into memory, but we can still operate on it through HDF5.

The best point is that the format of HDF5 is standardized, which means that the data set stored in HDF5 can be read by other developers in different languages, such as C, MATLAB and JAVA.

Write data to HDF5

If a worker wants to do well, he must sharpen his tools first.

Before we start our formal work, we need to write a small tool to read and write HDF5 files.

Directory structure:

|		|
|		|----callbacks
|		|----inputoutput
|		|		|
|		|		|
|		|----nn
|		|----preprocessing
|		|----utils
import h5py
import os

class HDF5DatasetWriter:
    def __init__(self, dims, outputPath, dataKey="images",
        if os.path.exists(outputPath):
            raise ValueError("The supplied 'outputPath' already exists and "
                             "cannot be overwritten. Manually delete the file before continuing", outputPath)
        self.db = h5py.File(outputPath, "w") = self.db.create_dataset(dataKey, dims, dtype="float")
        self.labels = self.db.create_dataset("labels", (dims[0],), dtype="int")
        self.bufsize = bufSize
        self.buffer = {"data" : [], "labels" : []}
        self.idx = 0

    def add(self, rows, labels):

        if len(self.buffer["data"]) >= self.bufsize:

    def flush(self):
        i = self.idx + len(self.buffer["data"])[self.idx:i] = self.buffer["data"]
        self.labels[self.idx:i] = self.buffer["labels"]
        self.idx = i
        self.buffer = {"data": [], "labels": []}

    def storeClassLabels(self, classLabels):
        dt = h5py.special_dtype(vlen=str)
        labelSet = self.db.create_dataset("label_names", (len(classLabels),), dtype=dt)
        labelSet[:] = classLabels

    def close(self):
        if len(self.buffer["data"]) > 0:

In the above program, we operate the data set in HDF5 file through several functions.

Its functions are:

flush: write the data in the cache to the file, and then empty the cache

add: write data and corresponding tags to the cache. If the data size in the cache exceeds the size of the cache, call the flush method

storeClassLabels: write the name of each category to the file in the format of string

Close: close the file stream. If there is still data in the cache at this time, call the flush method to write the file first.

feature extraction

Create a python file named and write the following code:

from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications import imagenet_utils
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from sklearn.preprocessing import LabelEncoder
from inOutput.hdf5datasetwriter import HDF5DatasetWriter
from imutils import paths
import numpy as np
import progressbar
import random
import os

dataset = "/Users/lingg/Desktop/dataset/Flower17-master/dataset/train"
output = "/Users/lingg/PycharmProjects/DLstudy/feature/flower-17/hdf5/feature.hdf5"
batchsize = 32
bufferSize = 1000

bs = batchsize
print("[INFO] loading images...")
imagePaths = list(paths.list_images(dataset))
labels = [p.split(os.path.sep)[-2] for p in imagePaths]
le = LabelEncoder()
labels = le.fit_transform(labels)

print("[INFO] loading network...")
model = VGG16(weights="imagenet", include_top=False)

dataset = HDF5DatasetWriter((len(imagePaths), 512 * 7 * 7), output, dataKey="features", bufSize=bufferSize)


widgets = ["Extracting Features: ", progressbar.Percentage(), " ", progressbar.Bar(), " ", progressbar.ETA()]
pbar = progressbar.ProgressBar(maxval=len(imagePaths), widgets=widgets).start()

for i in np.arange(0, len(imagePaths), bs):
    batchPaths = imagePaths[i:i + bs]
    batchLabels = labels[i:i + bs]
    batchImages = []

    for (j, imagePath) in enumerate(batchPaths):
        image = load_img(imagePath, target_size=(224, 224))
        image = img_to_array(image)
        image = np.expand_dims(image, axis=0)
        image = imagenet_utils.preprocess_input(image)
    batchImages = np.vstack(batchImages)
    features = model.predict(batchImages, batch_size=bs)
    features = features.reshape((features.shape[0], 512 * 7 * 7))
    dataset.add(features, batchLabels)


The variable dataset is the directory where the dataset is located. We use the flower-17 dataset.

The variable output is the target path of the extracted feature storage

Note that our file feature.hdf5 is generated automatically by the program and does not need to be created manually, but its directory needs to be created in advance. For example, / Users/lingg/PycharmProjects/DLstudy/feature/flower-17/hdf5 / in this article is to be submitted for creation, otherwise it will indicate that the target path does not exist.

In addition, due to the long waiting time, we added a control progressbar for program interaction with the outside world to show the progress of current feature extraction to the outside world. You can install it through the following command:

pip progressbar

Of course, you can use it if you don't want to. It's not necessary.

After executing the program, we can see the feature.hdf5 file in the corresponding directory.

What is saved in this file is the features extracted from the flower-17 dataset, which we will use to train the classifier in the following steps.

The extracted features are used to train the classifier

As we all know, it is not ideal to use a simple linear classifier to train the image directly. What if we train on the features extracted above?

Let's find out.

Create a file, name it:, and write the following code:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
import pickle
import h5py
import numpy as np
import sys

db = "/Users/lingg/PycharmProjects/DLstudy/feature/flower-17/hdf5/feature.hdf5"
model_path = "/Users/liushanlin/PycharmProjects/DLstudy/model/animals.cpickle"
jobs = -1

db = h5py.File(db, "r")
i = int(db["labels"].shape[0] * 0.75)
print("[INFO] tuning hyperparameters...")
params = {"C": [0.1, 1.0, 10.0, 100.0, 1000.0, 10000.0]}
model = GridSearchCV(LogisticRegression(), params, cv=3, n_jobs=-1)
trainX = db["features"][:i]
trainY = db["labels"][:i]
testX = db["features"][i:]
testY = db["labels"][i:]
targets = db["label_names"][:]
for i in np.arange(0, len(targets)):
    targets[i] = str(targets[i], encoding='utf-8')
print(targets), trainY)
print("[INFO] best hypermeters:{}".format(model.best_params_))

print("[INFO] evaluating...")
preds = model.predict(testX)
print(classification_report(testY, preds, target_names=targets))
print("[INFO] saving model...")
f = open(model_path, "wb")

Among them,

The variable db stores the feature file extracted in the previous step

The variable model_path is the path we want to serialize and store our trained linear classifier.

We classify features through logistic regression in the code and use GridSearchCV (used to compare and verify the best parameters). A total of six parameters are compared: {C ": [0.1, 1.0, 10.0, 100.0, 1000.0, 10000.0]}. It will help us find the one with the best effect. For specific use methods, please refer to the official documents.

Possible problems with using GridSearchCV in this Code: Grid search error: (GridSearchCV): 'ascii' codec can't encode characters in position 18-20

Operation results:

[INFO] tuning hyperparameters...
['bluebell' 'buttercup' 'colts_foot' 'cowslip' 'crocus' 'daffodil' 'daisy'
 'dandelion' 'fritillary' 'iris' 'lily_valley' 'pansy' 'snowdrop'
 'sunflower' 'tigerlily' 'tulip' 'windflower']

[INFO] best hypermeters:{'C': 1.0}
[INFO] evaluating...
              precision    recall  f1-score   support

    bluebell       0.96      0.96      0.96        26
   buttercup       0.93      1.00      0.97        14
  colts_foot       1.00      0.94      0.97        17
     cowslip       0.74      0.88      0.80        16
      crocus       0.74      0.93      0.82        15
    daffodil       0.88      0.94      0.91        16
       daisy       0.94      0.89      0.92        19
   dandelion       0.94      0.88      0.91        17
  fritillary       0.94      0.89      0.91        18
        iris       1.00      0.89      0.94        19
 lily_valley       0.83      0.94      0.88        16
       pansy       1.00      0.88      0.93        16
    snowdrop       0.62      0.81      0.70        16
   sunflower       1.00      1.00      1.00        16
   tigerlily       1.00      1.00      1.00        22
       tulip       1.00      0.58      0.73        19
  windflower       0.94      0.94      0.94        16

    accuracy                           0.90       298
   macro avg       0.91      0.90      0.90       298
weighted avg       0.92      0.90      0.90       298

[INFO] saving model...

Process finished with exit code 0

It can be seen that grid search helps us find the best parameter: c=1000.0.

And more surprisingly, the simple logistic regression also achieved very high accuracy, thanks to the features extracted by VGG.

Keywords: Machine Learning Computer Vision

Added by khaitan_anuj on Mon, 25 Oct 2021 11:57:50 +0300