[RotNet self supervised learning] predict image rotation angle

Thesis Guide

RotNet performs self supervised learning by predicting image rotation

This is a paper published by ICLR in 2018, which has been cited more than 1100 times. The idea of this paper comes from: if someone does not understand the concept of the object depicted in the image, he cannot recognize the rotation applied to the image.

In this article, we review the unsupervised representation learning by predicting image rotation at the University Paris Est. Using RotNet, image features are learned by training ConvNets to recognize 2d rotation applied to the image as input. Through this method, the unsupervised pre training AlexNet model achieves 54.4% of the mAP, which is only 2.4 points lower than the supervised AlexNet.

Image rotation prediction framework

Given four possible geometric transformations, namely 0, 90, 180 and 270 degree rotations, the convolution network model F(:) is trained to identify which rotation is applied to the input image.

Fy(Xy) is the probability of rotation transformation y predicted by model F(:), its input is an image that has been rotated and transformed, and the rotation angle of the output image.

In order to successfully predict the rotation of the image, ConvNet model must learn to locate the significant targets in the image, identify their direction and object type, and then associate the object direction with the original image.

The attention map generated by the trained AlexNet model (a) identifies the object (supervised) and (b) identifies the image rotation (self supervised).

The above attention map is calculated according to the activation amplitude of each spatial unit of the convolution layer, which essentially reflects where the network focuses most of its attention to classify the input image.

On the way, it can be seen that both the supervised model and the self supervised model seem to focus on roughly the same image area.

Rotating and dragging verification code solution

Once upon a time, were you troubled by a rotary verification code? Yes, today's topic - Rotary verification code.


When simulating login, the picture verification code is a big difficulty.

But yes RotNet , this problem will be solved easily Rotating and dragging verification code solution.

Two ideas

Image rotation considers two ideas: regression and classification

  • Regression: the predicted numerical results range from 0 to 360 °
  • Classification: predict 360 categories. The model predicts which category has the greatest probability of output

The convolution neural network is defined to train the rotating picture set to predict the rotation angle of the picture.

Big data application competition

Big data application competition : computer vision is widely used in many AI, such as automatic driving, visual navigation, target detection, target recognition and so on. All of them are related to computer vision, and image technology can often help improve computer vision, such as random clipping, random rotation, image blur and so on. The importance of image technology to computer vision is self-evident, so the title of this big data application competition is image righting challenge.

Convolutional neural network

Classification code:

# number of convolutional filters to use
nb_filters = 64
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)
# number of classes
nb_classes = 360

# model definition
input = Input(shape=(img_rows, img_cols, img_channels))
x = Conv2D(nb_filters, kernel_size, activation='relu')(input)
x = Conv2D(nb_filters, kernel_size, activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.25)(x)
x = Dense(nb_classes, activation='softmax')(x)

model = Model(inputs=input, outputs=x)

model.summary()

Model compilation

# model compilation
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=[angle_error])

Training parameters

# training parameters
batch_size = 128
nb_epoch = 50

Callback

# callbacks
checkpointer = ModelCheckpoint(
    filepath=os.path.join(output_folder, model_name + '.hdf5'),
    save_best_only=True
)
early_stopping = EarlyStopping(patience=2)
tensorboard = TensorBoard()

model training

# training loop
model.fit_generator(
    RotNetDataGenerator(
        X_train,
        batch_size=batch_size,
        preprocess_func=binarize_images,
        shuffle=True
    ),
    steps_per_epoch=nb_train_samples / batch_size,
    epochs=nb_epoch,
    validation_data=RotNetDataGenerator(
        X_test,
        batch_size=batch_size,
        preprocess_func=binarize_images
    ),
    validation_steps=nb_test_samples / batch_size,
    verbose=1,
    callbacks=[checkpointer, early_stopping, tensorboard]
)

Complete code

"""
@Author: ZS
@CSDN  : https://zsyll.blog.csdn.net/
@Time  : 2021/11/20 10:48
"""
from __future__ import print_function

import os
import sys

from keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard, ReduceLROnPlateau
from keras.applications.resnet50 import ResNet50
from keras.applications.imagenet_utils import preprocess_input
from keras.models import Model
from keras.layers import Dense, Flatten
from keras.optimizers import SGD

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from utils import angle_error, RotNetDataGenerator
from getImagePath import getPath

data_path = r'./data/image/'
train_filenames, test_filenames = getPath(data_path)

print(len(train_filenames), 'train samples')
print(len(test_filenames), 'test samples')

model_name = 'rotnet_resnet50'

# Classification quantity
nb_classes = 360
# input image shape
input_shape = (320, 320, 3)

# Load base model
base_model = ResNet50(weights='imagenet', include_top=False,
                      input_shape=input_shape)

# Add classification layer
x = base_model.output
x = Flatten()(x)
final_output = Dense(nb_classes, activation='softmax', name='fc360')(x)

# Create a new model
model = Model(inputs=base_model.input, outputs=final_output)

model.summary()

# Model compilation
model.compile(loss='categorical_crossentropy',
              optimizer=SGD(lr=0.01, momentum=0.9),
              metrics=[angle_error])

# Training parameters
batch_size = 64
nb_epoch = 20

output_folder = 'models'
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# callbacks
monitor = 'val_angle_error'
checkpointer = ModelCheckpoint(
    filepath=os.path.join(output_folder, model_name + '.hdf5'),
    monitor=monitor,
    save_best_only=True
)

reduce_lr = ReduceLROnPlateau(monitor=monitor, patience=3)
early_stopping = EarlyStopping(monitor=monitor, patience=5)
tensorboard = TensorBoard()

# Training model
model.fit_generator(
    RotNetDataGenerator(
        train_filenames,
        input_shape=input_shape,
        batch_size=batch_size,
        preprocess_func=preprocess_input,
        crop_center=True,
        crop_largest_rect=True,
        shuffle=True
    ),
    steps_per_epoch=len(train_filenames) / batch_size,
    epochs=nb_epoch,
    validation_data=RotNetDataGenerator(
        test_filenames,
        input_shape=input_shape,
        batch_size=batch_size,
        preprocess_func=preprocess_input,
        crop_center=True,
        crop_largest_rect=True
    ),
    validation_steps=len(test_filenames) / batch_size,
    callbacks=[checkpointer, reduce_lr, early_stopping, tensorboard],
    workers=10
)

Model call

# In the import area, sys is mandatory, and others are imported according to requirements
from __future__ import print_function
import os
import sys
import random
import numpy as np
import pandas as pd
import cv2
import tensorflow as tf
import tensorflow.keras as keras



import matplotlib.pyplot as plt
from mykeras.applications.imagenet_utils import preprocess_input
from mykeras.models import load_model
from utils import display_examples, RotNetDataGenerator, angle_error
import warnings
warnings.filterwarnings("ignore")
from tensorflow.keras import layers

# Code area, write according to requirements
class FileSequence(keras.utils.Sequence):
    def __init__(self,filenames,batch_size,filefunc,fileargs=(),labels=None,labelfunc=None,labelargs=(),shuffle=False):
        if labels: assert len(filenames) == len(labels)
        self.filenames  = filenames
        self.batch_size = batch_size
        self.filefunc   = filefunc
        self.fileargs   = fileargs
        self.labels     = labels
        self.labelfunc  = labelfunc
        self.labelargs  = labelargs  
        if shuffle:
            idx_list = list(range(len(self.filenames)))
            random.shuffle(idx_list)
            self.filenames = [self.filenames[idx] for idx in idx_list]
            if self.labels: self.labels = [self.labels[idx] for idx in idx_list]

    def __len__(self):
        return int(np.ceil(len(self.filenames) / float(self.batch_size)))

    def __getitem__(self, idx):
        batch_filenames = self.filenames[idx * self.batch_size: (idx+1) * self.batch_size]
        
        files = []
        for filename in batch_filenames:
            # tf.print(filename)
            file = self.filefunc(filename,*self.fileargs)
            files.append(file)
        if self.labels:
            batch_labels = self.labels[idx * self.batch_size: (idx+1) * self.batch_size]
            if self.labelfunc:
                return np.array(files), self.labelfunc(batch_labels,*self.labelargs)
            else:
                return np.array(files), batch_labels
        else:
            return np.array(files)

def fillWhite(img,size,mode=None):
    if len(img.shape) == 2: img = img.reshape(*img.shape,-1)
    assert len(img.shape) == 3
    h, w, c = img.shape
    assert (h < size) and (w < size)
    fillImg = np.zeros(shape=(size,size,c))
    if mode == "random":
        sh = random.randint(0,size-h)
        sw = random.randint(0,size-w)
        fillImg[sh:sh+h,sw:sw+w,...] = img
    elif mode == "centre" or mode == "center":
        fillImg[(size-h)//2:(size+h)//2,(size-w)//2:(size+w)//2,...] = img
    else:
        fillImg[:h,:w,...] = img
    return fillImg

def cropImg(img,size,mode=None):
    if len(img.shape) == 2: img = img.reshape(*img.shape,-1)
    assert len(img.shape) == 3
    h, w, c = img.shape
    assert (h >= size) and (w >= size)
    if mode == "random":
        sh = random.randint(0,h-size)
        sw = random.randint(0,w-size)
        cropImg = img[sh:sh+size,sw:sw+size,...]
    elif mode == "centre" or mode == "center":
        cropImg = img[(h-size)//2:(h+size)//2,(w-size)//2:(w+size)//2,...]
    else:
        cropImg = img[:size,:size,...]
    return cropImg

def fillCrop(img,size,mode=None):
    if len(img.shape) == 2: img = img.reshape(*img.shape,-1)
    assert len(img.shape) == 3
    h, w, c = img.shape
    assert ((h >= size) and (w < size)) or ((h < size) and (w >= size))
    fillcropImg = np.zeros(shape=(size,size,c))
    if mode == "random":
        if (h >= size) and (w < size):
            sh = random.randint(0,h-size)
            sw = random.randint(0,size-w)
            fillcropImg[:,sw:sw+w,:] = img[sh:sh+size,...]
        else:
            sh = random.randint(0,size-h)
            sw = random.randint(0,w-size)
            fillcropImg[sh:sh+h,...] = img[:,sw:sw+size,:]
    elif mode == "centre" or mode == "center":
        if (h >= size) and (w < size):
            fillcropImg[:,(size-w)//2:(size+w)//2,:] = img[(h-size)//2:(h+size)//2,...]
        else:
            fillcropImg[(size-h)//2:(size+h)//2,...] = img[:,(w-size)//2:(w+size)//2,:]
    else:
        if (h >= size) and (w < size):
            fillcropImg[:,:size,:] = img[:size,...]
        else:
            fillcropImg[:size,...] = img[:,:size,:]
    return fillcropImg

def resizeImg(img,size,mode=None):
    if len(img.shape) == 2: img = img.reshape(*img.shape,-1)
    assert len(img.shape) == 3
    h, w, c = img.shape
    if (h < size) and (w < size): return fillWhite(img,size,mode)
    elif (h >= size) and (w >= size): return cropImg(img,size,mode)
    else: return fillCrop(img,size,mode)

def filefunc(filename,mode):
    tf.print(filename)
    img = cv2.imread(filename)
    if not isinstance(img,np.ndarray):
        tf.print(filename)
    h, w, c = img.shape
    if (h >=256) or (w >= 256):
        img = resizeImg(img,256,mode)
        img = cv2.resize(img,(64,64))
    elif (h >=128) or (w >= 128):
        img = resizeImg(img,128,mode)
        img = cv2.resize(img,(64,64))
    else:
        img = resizeImg(img,64,mode)
    return img    

# Main function, fixed format, to_pred_dir is the folder where the forecast is located, result_save_path generates a path for the prediction results
# The following is an example
def main(to_pred_dir, result_save_path):
    runpyp = os.path.abspath(__file__)
    modeldirp = os.path.dirname(runpyp)
    modelp = os.path.join(modeldirp,"model.hdf5")
    model = load_model(modelp, custom_objects={'angle_error': angle_error})  # custom object

    pred_imgs = os.listdir(to_pred_dir)
    pred_imgsp_lines = [os.path.join(to_pred_dir,p) for p in pred_imgs]

    name, label = display_examples(
        model,
        pred_imgsp_lines,
        num_images=len(pred_imgsp_lines),
        size=(224, 224),
        crop_center=True,
        crop_largest_rect=True,
        preprocess_func=preprocess_input,
    )

    
    df = pd.DataFrame({"id":name,"label":label})
    df.to_csv(result_save_path,index=None)

# !!! be careful:
# The parameter given in the picture contest is to_pred_dir is a folder whose picture content is
# to_pred_dir/to_pred_0.png
# to_pred_dir/to_pred_1.png
# to_pred_dir/......
# The csv file header to be generated is ID and label, as follows
# image_id,label
# to_pred_0,4
# to_pred_1,76
# to_pred_2,...

if __name__ == "__main__":
    to_pred_dir = sys.argv[1]  # Folder path to be predicted
    result_save_path = sys.argv[2]  # File path for saving prediction results
    main(to_pred_dir, result_save_path)

reference resources: Link

come on.

thank!

strive!

Keywords: Python Computer Vision Deep Learning

Added by Jona on Wed, 29 Dec 2021 07:16:31 +0200