Title reproduced from AI Studio
Title item link https://aistudio.baidu.com/aistudio/projectdetail/3476364
Project introduction
This project provides the pre training weights and model files of ConvNeXt's propeller version. The weight is converted from PyTorch. And through the accuracy verification in the ImageNet 1K test set. See ConvNeXt for model files py. This project aims to provide the model weight, model file and verification accuracy of the propeller version. There is no description for the understanding of the model. If you want to have an in-depth understanding of the model, please refer to:
ConvNeXt: explore the ultimate potential of CNN network
ConvNeXt
See convnext for model files Py, let's talk about the PyTorch model transformation PaddlePaddle model matters needing attention.
- The weight transfer between linear layer PyTorch and PaddlePaddle needs to be transposed.
- The DropPath layer needs to be customized. I refer to this article PiT: visual Transformer network combined with pooling layer
- PaddlePaddle uses axis when it comes to parameters involved in dimension, and PyTorch is dim.
- Sometimes you need custom parameters, and PaddlePaddle's API is in. here
- PaddlePaddle has no permute, but uses transpose.
- For other simple API mappings, see here
In the 134th and 135th row of the py file, weight and bias are taken by a 1, that is, what is not done, this is PyTorch's writing, and PaddlePaddle's writing is not yet, I'm sorry to know the big guy's reply.
weight
Here is an introduction to the weight files of this accuracy verification. There are five in total, representing ConvNeXt of different sizes and versions.
First, let's have an overview.
name | resolution | acc@1 | #params | FLOPs |
---|---|---|---|---|
ConvNeXt-T | 224x224 | 82.1 | 28M | 4.5G |
ConvNeXt-S | 224x224 | 83.1 | 50M | 8.7G |
ConvNeXt-B | 224x224 | 83.8 | 89M | 15.4G |
ConvNeXt-L | 224x224 | 84.3 | 198M | 34.4G |
ConvNeXt-XL | 384x384 | 87.8 | 350M | 179.0G |
Due to the limitation of the number of files, only the five different versions and accuracy verification are shown here. The weight on 22K will be placed in other data sets here . The data set used in this project is ConvNeXt pre training model PaddlePaddle version Welcome to download and use. Let's do accuracy verification 8!
Accuracy verification
The following accuracy verification code reference Lonely, you go in The way big guys write.
# Unzip the ImageNet 1K dataset !mkdir data/ILSVRC2012 !tar -xf ~/data/data68594/ILSVRC2012_img_val.tar -C ~/data/ILSVRC2012
import os import cv2 import numpy as np import warnings import paddle import paddle.vision.transforms as T from PIL import Image warnings.filterwarnings('ignore') # Building data sets class ILSVRC2012(paddle.io.Dataset): def __init__(self, root, label_list, transform, backend='pil'): self.transform = transform self.root = root self.label_list = label_list self.backend = backend self.load_datas() def load_datas(self): self.imgs = [] self.labels = [] with open(self.label_list, 'r') as f: for line in f: img, label = line[:-1].split(' ') self.imgs.append(os.path.join(self.root, img)) self.labels.append(int(label)) def __getitem__(self, idx): label = self.labels[idx] image = self.imgs[idx] if self.backend=='cv2': image = cv2.imread(image) else: image = Image.open(image).convert('RGB') image = self.transform(image) return image.astype('float32'), np.array(label).astype('int64') def __len__(self): return len(self.imgs) val_transforms = T.Compose([ T.Resize(int(224 / 0.96), interpolation='bicubic'), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
# Configuration model from ConvNeXt import convnext_tiny, convnext_small, convnext_base, convnext_large, convnext_xlarge cvt_t = convnext_tiny() cvt_s = convnext_small() cvt_b = convnext_base() cvt_l = convnext_large() cvt_x = convnext_xlarge()
W0211 21:12:49.976547 686 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0211 21:12:49.982038 686 device_context.cc:465] device: 0, cuDNN Version: 7.6.
cvt_t.load_dict(paddle.load('data/data127804/convnext_tiny_1k_224_ema.pdparams')) cvt_t = paddle.Model(cvt_t) cvt_t.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5))) cvt_s.load_dict(paddle.load('data/data127804/convnext_small_1k_224_ema.pdparams')) cvt_s = paddle.Model(cvt_s) cvt_s.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5))) cvt_b.load_dict(paddle.load('data/data127804/convnext_base_1k_224_ema.pdparams')) cvt_b = paddle.Model(cvt_b) cvt_b.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5))) cvt_l.load_dict(paddle.load('data/data127804/convnext_large_1k_224_ema.pdparams')) cvt_l = paddle.Model(cvt_l) cvt_l.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5))) cvt_x.load_dict(paddle.load('data/data127804/convnext_xlarge_22k_1k_384_ema.pdparams')) cvt_x = paddle.Model(cvt_x) cvt_x.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5)))
val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val_transforms, label_list='data/data68594/val_list.txt', backend='pil') # Model validation acc = cvt_t.evaluate(val_dataset, batch_size=128, num_workers=0, verbose=1) print(acc)
Eval begin... step 391/391 [==============================] - acc_top1: 0.8199 - acc_top5: 0.9588 - 1s/step Eval samples: 50000 {'acc_top1': 0.81992, 'acc_top5': 0.9588}
val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val_transforms, label_list='data/data68594/val_list.txt', backend='pil') # Model validation acc = cvt_s.evaluate(val_dataset, batch_size=128, num_workers=0, verbose=1) print(acc)
Eval begin... step 391/391 [==============================] - acc_top1: 0.8308 - acc_top5: 0.9652 - 1s/step Eval samples: 50000 {'acc_top1': 0.83078, 'acc_top5': 0.96516}
val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val_transforms, label_list='data/data68594/val_list.txt', backend='pil') # Model validation acc = cvt_b.evaluate(val_dataset, batch_size=128, num_workers=0, verbose=1) print(acc)
Eval begin... step 391/391 [==============================] - acc_top1: 0.8384 - acc_top5: 0.9683 - 2s/step Eval samples: 50000 {'acc_top1': 0.8384, 'acc_top5': 0.9683}
val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val_transforms, label_list='data/data68594/val_list.txt', backend='pil') # Model validation acc = cvt_l.evaluate(val_dataset, batch_size=128, num_workers=0, verbose=1) print(acc)
Eval begin... step 391/391 [==============================] - acc_top1: 0.8435 - acc_top5: 0.9697 - 2s/step Eval samples: 50000 {'acc_top1': 0.84346, 'acc_top5': 0.9697}
# resize to 384 val384_transforms = T.Compose([ T.Resize((384, 384), interpolation='bicubic'), # T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val384_transforms, label_list='data/data68594/val_list.txt', backend='pil') # Model validation acc = cvt_x.evaluate(val_dataset, batch_size=128, num_workers=0, verbose=1) print(acc)
Eval begin... step 391/391 [==============================] - acc_top1: 0.8775 - acc_top5: 0.9856 - 5s/step Eval samples: 50000 {'acc_top1': 0.8775, 'acc_top5': 0.98556}
summary
The accuracy of some models is somewhat different. It can not be avoided. The process of PyTorch to PaddlePaddle or the process of data processing will be a little different. But in general, there is no big difference, so the model transformation is successful.
ConvNeXt is the strongest convolution classification network. Under the rule of the existing Transformer, ConvNeXt still brings back a city for the convolution network with its strong modeling ability. However, to be honest, convolution is better than Transformer, and what cannot be replaced is that the input of convolution can be any scale, while Transformer is more troublesome.
About the author
Name: Zhang Jin
School: Shanghai University of applied technology, second Graduate School
Research interests: CV, salient target detection
AI Stidio link: https://aistudio.baidu.com/aistudio/personalcenter/thirdview/635490
GitHub link: https://github.com/zhangjin12138
Personal honor: PPDE. CCF member. The third China AI + innovation and Entrepreneurship Competition: semi supervised learning goal positioning competition won the first place. "Automatic recognition of water meter reading in CCF real scene" won the first place. A two zone SCI. N to be cast.
If you like this project, you are welcome to like it.