Basic usage of pytoch (data loading, type conversion)


This blog post will be released in Nuggets community first!

preface

Through the previous introductions, we probably know some basic concepts of tensor of our pytorch, as well as some details of our gradient and tensor replication. Tensor is very similar to numpy to a great extent. In some cases, we can even use tensor directly for calculation. Now let's talk about some basic uses of pytorch.

After all, we use pytorch to build our neural network for deep learning. Well, in the brief overview of machine learning, deep learning is actually a branch of our machine learning, that is, machine learning with a special point. The previous machine learning steps of sklearn and aruze's cloud platform are roughly divided into five parts. In fact, they are similar in pytorch, but the algorithm is replaced by a more abstract neural network.

So we can roughly divide pytorch into these pieces


Here, we will mainly focus on data loading and conversion.

Type conversion

At the beginning, we said that tensor can convert numpy data, but sometimes we need to deal with text, pictures and sounds. So we need a converter (of course, you can turn it into numpy and then turn it into tensor, but that's what you do in a hurry)

Use the toolkit here

tensorvision

For example, we convert images.


We found that there are still many contents under the toolkit. Totensor() can directly convert (see the source code and instructions)


Here we can easily complete the transformation.

Compose "chain transformation"

Sometimes we may need to convert multiple times. For example, we need to change the size of an image first, and then convert it. At this time, in order to avoid duplication of code, we can still do so at this time.

from torchvision import transforms

tensor_to = transforms.ToTensor()
compose = transforms.Compose([tensor_to,])
image = Image.open("train/1/0BGHNV6P.jpg")

img = compose(image)
print(img)


Well, there are other ways. I won't talk about it. You pycharm came out at all, and there are notes.
Type conversion is actually very simple, and there are many corresponding situations. It's really hard to explain here.

data processing

As we all know, machine learning is inseparable from data, data set. For some well-known network models or data sets, pytorch provides automatic download tools.

Self contained dataset

This means that pytorch will automatically download the data collection through the crawler and then package it for us.
This is also using tensorvision
For example, Download CIFAR10 dataset

train_set = torchvision.datasets.CIFAR10(root="./dataset",train=True,download=True)
tese_set = torchvision.datasets.CIFAR10(root="./dataset",train=False,download=True)

Directly, but note that the data set obtained here is not of tensor type, and we need to carry out type conversion

from torchvision import transforms

trans = transforms.Compose([transforms.ToTensor()])
dataset = torchvision.datasets.CIFAR10(root="./dataset",train=False,transform=trans,download=True)

Data loading

Then we load the data
The tools under utils are used here

from torch.utils.data import DataLoader
from torchvision import transforms

trans = transforms.Compose([transforms.ToTensor()])
dataset = torchvision.datasets.CIFAR10(root="./dataset",train=False,transform=trans,download=True)

dataloader = DataLoader(dataset,batch_size=64)

Here we mainly introduce some parameters of DataLoader.

Custom get data

This is more primitive, that is, sometimes we need to load the dataset ourselves, for example.

This is the data set downloaded from the Internet. Now we need to import this into our pytorch.
This folder is the tag name, which is in this dataset. 1 is a picture of 1 yuan and 100 is a picture of 100 yuan.

Let me give the code directly here

from torch.utils.data import Dataset,DataLoader
from torchvision import transforms
import os
from PIL import Image

# Get data through Dataset
class MyDataset(Dataset):

    def __init__(self,RootDir,LabelDir):
        self.RootDir = RootDir

        self.LabelDir = LabelDir
        self.transform = transforms.ToTensor()
        self.ImagePathDir = os.path.join(self.RootDir,self.LabelDir)
        self.ImageNameItems = os.listdir(self.ImagePathDir)

    def __getitem__(self, item):
        # item is to get a data element, lazy mode. I'll give it to you if you want to use it
        ItemName = self.ImageNameItems[item]
        ImagePathItem = os.path.join(self.RootDir,self.LabelDir,ItemName)
        ItemGet = self.transform(Image.open(ImagePathItem).resize((500,500)))
        ItemLabel = self.LabelDir
        return ItemGet,ItemLabel

    def __len__(self):
        return len(self.ImageNameItems)




if __name__ =="__main__":
    RootDir = "train"
    OneYuanLabel = "1"
    HandoneYuanLabel = "100"
    OneYuanData = MyDataset(RootDir,OneYuanLabel)
    HandoneData = MyDataset(RootDir,HandoneYuanLabel)

    DataGet  = OneYuanData+HandoneData


    train_data = DataLoader(dataset=DataGet,batch_size=18,shuffle=True,num_workers=0,drop_last=True)

    for data in train_data:
        imgs,tags = data
        print(imgs.shape)



The point is that we inherit the Dataset and then implement it__ getitem()__ This magic method. The code is actually very simple. When we get the picture name of our path, and then call the magic method, we read the picture and directly convert it into tensor. In fact, this is similar to the data obtained earlier, but we directly convert it. At the same time, this is why we use DataLoader to take out the data, Instead of waiting for the training model, it's very slow.

summary

These are the most basic operations, so tomorrow we will talk about how to play neural network and use pytorch. Here, let's take CNN as an example to build the CIFAR10 model. We'll do a little demo later.

In fact, the use of python is very simple, but there are many preconditions. Otherwise, it is difficult to understand. Unlike the business framework such as Django of python and ssm springcloud of Java, it's OK to recite a few API s and annotations. It's easy to get started. Of course, the source code is different.

Keywords: Python AI Pytorch

Added by phonydream on Wed, 26 Jan 2022 20:33:43 +0200