Implementation of the whole process of concise cat image classification based on PaddleX

  • Propeller tool assembly - PaddleX
  • The whole process development tool of the propeller integrates all the capabilities required for in-depth learning and development such as the core framework, model library, tools and components of the propeller, and opens up the whole process of in-depth learning and development.
  • PaddleX also provides a concise Python API and a graphical development client for one click download and installation. Users can choose the corresponding development mode according to the actual production needs to obtain the best experience of the whole process development of the propeller.

Data description

This competition requires contestants to classify 12 kinds of cats, which belongs to the classic image classification task in CV direction. As the cornerstone of other image tasks, image classification task can make everyone get started with computer vision faster. The competition data set contains pictures of 12 kinds of cats and is divided into training set and test set.

  1. Training set: provide high-definition color pictures and the classification of the pictures. There are 2160 pictures of cats, including annotation files.
  2. Test set: only color pictures are provided, with a total of 240 pictures of cats, excluding label files.

1 import dependency

!pip install paddlex
import warnings
warnings.filterwarnings('ignore')

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

import paddle
import paddlex as pdx
from paddlex import transforms as T

import numpy as np
import pandas as pd
import shutil
import glob

import cv2
import imghdr
from PIL import Image

2 data cleaning

  • Generate ImageNet format folder, and the target data format is as follows:
Dataset/ # Image classification dataset root directory
|--class A/  # All pictures in the current folder belong to category A
|  |--a_1.jpg
|  |--a_2.jpg
|  |--...
|  |--...
|
|--...
|
|--class Z/ # All pictures in the current folder belong to category Z
|  |--z1.jpg
|  |--z2.jpg
|  |--...
|  |--...
  • Because we have a train in our training center_ list. Txt stores the corresponding category information, so we first put all the pictures in the corresponding category folder, and then use PaddleX to automatically divide the data.

2.1 decompressing data sets

pdx.utils.decompress('data/data10954/cat_12_train.zip')
pdx.utils.decompress('data/data10954/cat_12_test.zip')

Generate 12 category folders.

for i in range(12):
    cls_path = os.path.join('data/data10954/ImageNetDataset/', '%02d' % int(i))
    if not os.path.exists(cls_path):
        os.makedirs(cls_path)
  • Why use 00 / 01 / As a category? Because the PaddleX partition is sorted according to the string, the 2 / 3 /. After the partition The numerical number of is after 10 / 11.
  • We hope that the category number output by the model is consistent with our folder (i.e. the number submitted by the competition), so it is set to XX format.

2.2 abnormal format cleaning

  • Generate a one-to-one correspondence between the file name and the category, and then put the pictures into the target folder according to the category cls: data / data10954 / imagenetdataset / * / * jpg.
train_df = pd.read_csv('data/data10954/train_list.txt', header=None, sep='\t')
train_df.columns = ['name', 'cls']
train_df['name'] = train_df['name'].apply(lambda x: str(x).strip().split('/')[-1])
train_df['cls'] = train_df['cls'].apply(lambda x: '%02d' % int(str(x).strip()))
train_df.head()
namecls
08GOkTtqw7E6IHZx4olYnhzvXLCiRsUfM.jpg00
1hwQDH3VBabeFXISfjlWEmYicoyr6qK1p.jpg00
2RDgZKvM6sp3Tx9dlqiLNEVJjmcfQ0zI4.jpg00
3ArBRzHyphTxFS2be9XLaU58m34PudlEf.jpg00
4kmW7GTX6uyM2A53NBZxibYRpQnIVatCH.jpg00
  • The model input picture format should be RGB three channel, if imghdr If what does not recognize the picture format, it will be deleted.
for i in range(len(train_df)):
    img_path = os.path.join('data/data10954/cat_12_train', train_df.at[i, 'name'])

    if os.path.exists(img_path) and imghdr.what(img_path):
        img = Image.open(img_path)
        if img.mode != 'RGB':
            img = img.convert('RGB')
            img.save(img_path)
    else:
        os.remove(img_path)
        print('delete:', img_path)
delete: data/data10954/cat_12_train/ieOvwupZbC4Xckj73znWxo0ARMKD5FrP.jpg
delete: data/data10954/cat_12_train/ovY2atRg8fsZ4jTbKC0UJIOd7mlPEy9u.jpg
  • SRC from source path_ Move path to destination path dst_path.
  • Note: in the previous step, only the exception picture was deleted, but the DataFrame was not updated. Therefore, use try except to ignore the situation where the record in the DataFrame still exists but the picture does not exist.
for i in range(len(train_df)):
    src_path = os.path.join(
        'data/data10954/cat_12_train',
        train_df.at[i, 'name'])

    dst_path = os.path.join(
        os.path.join(
            'data/data10954/ImageNetDataset/',
            train_df.at[i, 'cls']),
        train_df.at[i, 'name'])

    try:
        shutil.move(src_path, dst_path)
    except Exception as e:
        print(e)
[Errno 2] No such file or directory: 'data/data10954/cat_12_train/ieOvwupZbC4Xckj73znWxo0ARMKD5FrP.jpg'
[Errno 2] No such file or directory: 'data/data10954/cat_12_train/ovY2atRg8fsZ4jTbKC0UJIOd7mlPEy9u.jpg'

3 data division

!paddlex --split_dataset --format ImageNet\
    --dataset_dir data/data10954/ImageNetDataset\
    --val_value 0.15\
    --test_value 0

4 data transformation and reading

train_transforms = T.Compose([
    T.MixupImage(mixup_epoch=115),
    T.ResizeByShort(short_size=256),
    T.RandomCrop(crop_size=224, aspect_ratio=[0.75, 1.25], scaling=[0.3, 1.0]),
    T.RandomHorizontalFlip(0.5),
    T.RandomDistort(
        brightness_range=0.4, brightness_prob=0.5,
        contrast_range=0.4, contrast_prob=0.5,
        saturation_range=0.4, saturation_prob=0.5,
        hue_range=18, hue_prob=0.5),
    T.RandomBlur(0.05),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

eval_transforms = T.Compose([
    T.ResizeByShort(short_size=256),
    T.CenterCrop(crop_size=224),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
train_dataset = pdx.datasets.ImageNet(
    data_dir='data/data10954/ImageNetDataset',
    file_list='data/data10954/ImageNetDataset/train_list.txt',
    label_list='data/data10954/ImageNetDataset/labels.txt',
    transforms=train_transforms,
    shuffle=True)

eval_dataset = pdx.datasets.ImageNet(
    data_dir='data/data10954/ImageNetDataset',
    file_list='data/data10954/ImageNetDataset/val_list.txt',
    label_list='data/data10954/ImageNetDataset/labels.txt',
    transforms=eval_transforms)

5 configuration and training

model = pdx.cls.ResNet101_vd_ssld(num_classes=len(train_dataset.labels))

5.1 learning rate strategy

The learning rate strategy and parameters adopt the classic IMAGENET training method:

  1. LinearWarmup: warm up the learning rate by 5 epochs (warm up_steps = step_each_Epoch * 5);
  2. Piecewsedecay: the periodic learning rate decreases, and the learning rate of every 30 epochs decreases to 0.1 times of the original, a total of 120 epochs (learning_rate * (0.1**i));
  3. The Momentum optimizer with Momentum is selected, and the weight L2 regularization coefficient is 0.0005;
  4. The batch size and initial learning rate are linearly scaled from 256-0.1 to 64-0.025. Considering the small size of the data set and the use of pre training weight, they are reduced to 64-0.0125.
  • You can enter in the notebook? paddle. optimizer. lr. Piecweisedecay knows how to use this block, for example? paddle. optimizer. lr. Cosine annealingdecay cosine attenuation.
num_epochs = 120
learning_rate = 0.0125
lr_decay_epochs = [30, 60, 90]
train_batch_size = 64
step_each_epoch = train_dataset.num_samples // train_batch_size

boundaries = [b * step_each_epoch for b in lr_decay_epochs]
values = [learning_rate * (0.1**i) for i in range(len(lr_decay_epochs) + 1)]
lr = paddle.optimizer.lr.PiecewiseDecay(
    boundaries=boundaries,
    values=values)

lr = paddle.optimizer.lr.LinearWarmup(
    learning_rate=lr,
    warmup_steps=step_each_epoch * 5,
    start_lr=0.0,
    end_lr=learning_rate)

optimizer = paddle.optimizer.Momentum(
    learning_rate=lr,
    momentum=0.9,
    weight_decay=paddle.regularizer.L2Decay(0.0005),
    parameters=model.net.parameters())

5.2 start training

  • You can enter Watch - N 0 NVIDIA SMI in the terminal to view the memory capacity, and adjust the size of picture transformation, batch size and other parameters as required.
  • Capacity of this version: 9354MiB / 16384MiB.
  • Because the division proportion of validation data is small and Mixup Data enhancement makes the high index model on the early verification set less robust than that at the end of training.
model.train(
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,

    num_epochs=num_epochs,
    train_batch_size=train_batch_size,
    optimizer=optimizer,

    save_interval_epochs=1,
    log_interval_steps=step_each_epoch * 5,

    pretrain_weights='IMAGENET',
    save_dir='output/ResNet101_vd_ssld',
    use_vdl=True)

6 evaluation and prediction

6.1 model evaluation

model = pdx.load_model('output/ResNet101_vd_ssld/best_model')

Different data evaluation transformations will also lead to different results, such as the original transformation Eval below_ transforms_ Origin and modified transform eval_transforms_modify, the indicators are a little different.

eval_transforms_origin = T.Compose([
    T.ResizeByShort(short_size=256),
    T.CenterCrop(crop_size=224),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

eval_transforms_modify = T.Compose([
    T.Resize(target_size=224),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
eval_dataset = pdx.datasets.ImageNet(
    data_dir='data/data10954/ImageNetDataset',
    file_list='data/data10954/ImageNetDataset/val_list.txt',
    label_list='data/data10954/ImageNetDataset/labels.txt',
    transforms=eval_transforms_origin)

model.evaluate(eval_dataset=eval_dataset, batch_size=64)
OrderedDict([('acc1', 0.9713542), ('acc5', 1.0)])
eval_dataset = pdx.datasets.ImageNet(
    data_dir='data/data10954/ImageNetDataset',
    file_list='data/data10954/ImageNetDataset/val_list.txt',
    label_list='data/data10954/ImageNetDataset/labels.txt',
    transforms=eval_transforms_modify)

model.evaluate(eval_dataset=eval_dataset, batch_size=64)
OrderedDict([('acc1', 0.9583333), ('acc5', 1.0)])
  • It should be noted that if Eval is not specified during model training_ Dataset, the model saved in the training process will not have the built-in transformation method during evaluation. At this time, it needs to be in model Model. Needs to be specified in predict() predit(image, transforms=eval_transforms).
# We set the evaluation transformation during training, so the saved model comes with us
model.get_model_info()['Transforms']
[{'ResizeByShort': {'short_size': 256, 'max_size': -1, 'interp': 'LINEAR'}},
 {'CenterCrop': {'crop_size': 224}},
 {'Normalize': {'mean': [0.485, 0.456, 0.406],
   'std': [0.229, 0.224, 0.225],
   'min_val': [0, 0, 0],
   'max_val': [255.0, 255.0, 255.0],
   'is_scale': True}}]

6.2 model prediction

Extract some images in the validation set for visualization.

df_val = pd.read_csv('data/data10954/ImageNetDataset/val_list.txt', header=None, sep='\s+')
df_val.columns = ['path', 'cls']
df_val = df_val.sample(n=12, replace=False)
df_val.index = range(len(df_val))
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure(figsize=(8, 10))
for i in range(12):
    plt.subplot(4, 3, i+1)
    plt.axis('off')
    image = cv2.imread(os.path.join('data/data10954/ImageNetDataset', df_val.at[i, 'path']))
    result = model.predict(image)[0]
    plt.title("%d (True) / %d (Predict) - %.4f" % (df_val.at[i, 'cls'], result['category_id'], result['score']))
    plt.imshow(image[:, :, [2, 1, 0]])

plt.tight_layout()
plt.show()

  • Generate the competition submission file in the left column work / result Click download under CSV.
  • In order to avoid reading abnormal pictures in the test set, PIL is also used here Image. open. Convert ('rgb ') to convert the picture mode, and then convert it to numpy Ndarray (note that the RGB channel is converted to BGR).
test_list = sorted(glob.glob('data/data10954/cat_12_test/*.jpg'))
test_df = pd.DataFrame()

for i in range(len(test_list)):
    img = Image.open(test_list[i]).convert('RGB')
    img = np.asarray(img, dtype='float32')
    img = img[:, :, [2, 1, 0]]

    result = model.predict(img)

    test_df.at[i, 'name'] = str(test_list[i]).split('/')[-1]
    test_df.at[i, 'cls'] = int(result[0]['category_id'])

test_df[['name']] = test_df[['name']].astype(str)
test_df[['cls']] = test_df[['cls']].astype(int)

/')[-1]
    test_df.at[i, 'cls'] = int(result[0]['category_id'])

test_df[['name']] = test_df[['name']].astype(str)
test_df[['cls']] = test_df[['cls']].astype(int)

test_df.to_csv('work/result.csv', index=False, header=False)
  • Get the result file Question 1: Twelve cat classifications In, the generated version is updated csv submission, get the calculation force and integral (a 0.954 file is saved in this version).

7 project summary

  • The above is about the whole process of applying PaddleX to cat image twelve classification competition.
  • It should be noted that the complexity of program code is not necessarily positively correlated with the submission score, and the specific steps and parameters can be modified by yourself.
  • Information about the interpretability of the model can be referred to PaddleX 1.3.11 version document.
  • More image classification techniques can be referred to PaddleClas - Ticks.

Keywords: Computer Vision Deep Learning paddlepaddle

Added by zoreli on Fri, 10 Dec 2021 02:38:56 +0200