ConvNeXt propeller version pre training weights_ copy

Title reproduced from AI Studio

Title item link https://aistudio.baidu.com/aistudio/projectdetail/3476364

Project introduction

This project provides the pre training weights and model files of ConvNeXt's propeller version. The weight is converted from PyTorch. And through the accuracy verification in the ImageNet 1K test set. See ConvNeXt for model files py. This project aims to provide the model weight, model file and verification accuracy of the propeller version. There is no description for the understanding of the model. If you want to have an in-depth understanding of the model, please refer to:
ConvNeXt: explore the ultimate potential of CNN network

ConvNeXt

See convnext for model files Py, let's talk about the PyTorch model transformation PaddlePaddle model matters needing attention.

The weight transfer between linear layer PyTorch and PaddlePaddle needs to be transposed.
The DropPath layer needs to be customized. I refer to this article PiT: visual Transformer network combined with pooling layer
PaddlePaddle uses axis when it comes to parameters involved in dimension, and PyTorch is dim.
Sometimes you need custom parameters, and PaddlePaddle's API is in. here
PaddlePaddle has no permute, but uses transpose.
For other simple API mappings, see here

In the 134th and 135th row of the py file, weight and bias are taken by a 1, that is, what is not done, this is PyTorch's writing, and PaddlePaddle's writing is not yet, I'm sorry to know the big guy's reply.

weight

Here is an introduction to the weight files of this accuracy verification. There are five in total, representing ConvNeXt of different sizes and versions.
First, let's have an overview.

name	resolution	acc@1	#params	FLOPs
ConvNeXt-T	224x224	82.1	28M	4.5G
ConvNeXt-S	224x224	83.1	50M	8.7G
ConvNeXt-B	224x224	83.8	89M	15.4G
ConvNeXt-L	224x224	84.3	198M	34.4G
ConvNeXt-XL	384x384	87.8	350M	179.0G

Due to the limitation of the number of files, only the five different versions and accuracy verification are shown here. The weight on 22K will be placed in other data sets here . The data set used in this project is ConvNeXt pre training model PaddlePaddle version Welcome to download and use. Let's do accuracy verification 8!

Accuracy verification

The following accuracy verification code reference Lonely, you go in The way big guys write.

# Unzip the ImageNet 1K dataset
!mkdir data/ILSVRC2012
!tar -xf ~/data/data68594/ILSVRC2012_img_val.tar -C ~/data/ILSVRC2012

import os
import cv2
import numpy as np
import warnings
import paddle
import paddle.vision.transforms as T
from PIL import Image
warnings.filterwarnings('ignore')

# Building data sets
class ILSVRC2012(paddle.io.Dataset):
    def __init__(self, root, label_list, transform, backend='pil'):
        self.transform = transform
        self.root = root
        self.label_list = label_list
        self.backend = backend
        self.load_datas()

    def load_datas(self):
        self.imgs = []
        self.labels = []
        with open(self.label_list, 'r') as f:
            for line in f:
                img, label = line[:-1].split(' ')
                self.imgs.append(os.path.join(self.root, img))
                self.labels.append(int(label))

    def __getitem__(self, idx):
        label = self.labels[idx]
        image = self.imgs[idx]
        if self.backend=='cv2':
            image = cv2.imread(image)
        else:
            image = Image.open(image).convert('RGB')
        image = self.transform(image)
        return image.astype('float32'), np.array(label).astype('int64')

    def __len__(self):
        return len(self.imgs)


val_transforms = T.Compose([
    T.Resize(int(224 / 0.96), interpolation='bicubic'),
    T.CenterCrop(224),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Configuration model
from ConvNeXt import convnext_tiny, convnext_small, convnext_base, convnext_large, convnext_xlarge

cvt_t = convnext_tiny()
cvt_s = convnext_small()
cvt_b = convnext_base()
cvt_l = convnext_large()
cvt_x = convnext_xlarge()

W0211 21:12:49.976547   686 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0211 21:12:49.982038   686 device_context.cc:465] device: 0, cuDNN Version: 7.6.

cvt_t.load_dict(paddle.load('data/data127804/convnext_tiny_1k_224_ema.pdparams'))
cvt_t = paddle.Model(cvt_t)
cvt_t.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5)))

cvt_s.load_dict(paddle.load('data/data127804/convnext_small_1k_224_ema.pdparams'))
cvt_s = paddle.Model(cvt_s)
cvt_s.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5)))

cvt_b.load_dict(paddle.load('data/data127804/convnext_base_1k_224_ema.pdparams'))
cvt_b = paddle.Model(cvt_b)
cvt_b.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5)))

cvt_l.load_dict(paddle.load('data/data127804/convnext_large_1k_224_ema.pdparams'))
cvt_l = paddle.Model(cvt_l)
cvt_l.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5)))

cvt_x.load_dict(paddle.load('data/data127804/convnext_xlarge_22k_1k_384_ema.pdparams'))
cvt_x = paddle.Model(cvt_x)
cvt_x.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5)))

val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val_transforms, label_list='data/data68594/val_list.txt', backend='pil')
# Model validation
acc = cvt_t.evaluate(val_dataset, batch_size=128, num_workers=0, verbose=1)
print(acc)

Eval begin...
step 391/391 [==============================] - acc_top1: 0.8199 - acc_top5: 0.9588 - 1s/step          
Eval samples: 50000
{'acc_top1': 0.81992, 'acc_top5': 0.9588}

val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val_transforms, label_list='data/data68594/val_list.txt', backend='pil')
# Model validation
acc = cvt_s.evaluate(val_dataset, batch_size=128, num_workers=0, verbose=1)
print(acc)

Eval begin...
step 391/391 [==============================] - acc_top1: 0.8308 - acc_top5: 0.9652 - 1s/step          
Eval samples: 50000
{'acc_top1': 0.83078, 'acc_top5': 0.96516}

val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val_transforms, label_list='data/data68594/val_list.txt', backend='pil')
# Model validation
acc = cvt_b.evaluate(val_dataset, batch_size=128, num_workers=0, verbose=1)
print(acc)

Eval begin...
step 391/391 [==============================] - acc_top1: 0.8384 - acc_top5: 0.9683 - 2s/step          
Eval samples: 50000
{'acc_top1': 0.8384, 'acc_top5': 0.9683}

val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val_transforms, label_list='data/data68594/val_list.txt', backend='pil')
# Model validation
acc = cvt_l.evaluate(val_dataset, batch_size=128, num_workers=0, verbose=1)
print(acc)

Eval begin...
step 391/391 [==============================] - acc_top1: 0.8435 - acc_top5: 0.9697 - 2s/step          
Eval samples: 50000
{'acc_top1': 0.84346, 'acc_top5': 0.9697}

# resize to 384

val384_transforms = T.Compose([
    T.Resize((384, 384), interpolation='bicubic'),
  #  T.CenterCrop(224),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val384_transforms, label_list='data/data68594/val_list.txt', backend='pil')
# Model validation
acc = cvt_x.evaluate(val_dataset, batch_size=128, num_workers=0, verbose=1)
print(acc)

Eval begin...
step 391/391 [==============================] - acc_top1: 0.8775 - acc_top5: 0.9856 - 5s/step          
Eval samples: 50000
{'acc_top1': 0.8775, 'acc_top5': 0.98556}

summary

The accuracy of some models is somewhat different. It can not be avoided. The process of PyTorch to PaddlePaddle or the process of data processing will be a little different. But in general, there is no big difference, so the model transformation is successful.

ConvNeXt is the strongest convolution classification network. Under the rule of the existing Transformer, ConvNeXt still brings back a city for the convolution network with its strong modeling ability. However, to be honest, convolution is better than Transformer, and what cannot be replaced is that the input of convolution can be any scale, while Transformer is more troublesome.

About the author

Name: Zhang Jin

School: Shanghai University of applied technology, second Graduate School

Research interests: CV, salient target detection

AI Stidio link: https://aistudio.baidu.com/aistudio/personalcenter/thirdview/635490

GitHub link: https://github.com/zhangjin12138

Personal honor: PPDE. CCF member. The third China AI + innovation and Entrepreneurship Competition: semi supervised learning goal positioning competition won the first place. "Automatic recognition of water meter reading in CCF real scene" won the first place. A two zone SCI. N to be cast.

If you like this project, you are welcome to like it.

Keywords: Pytorch Deep Learning paddlepaddle

Added by webdesco on Sat, 19 Feb 2022 18:15:13 +0200

Programming VIP