Learning notes from scratch of 3.6 softmax regression in hands on deep learning + PyTorch

preface

Softmax regression, also known as multiple or multi class Logistic regression, is the generalization of Logistic regression in multi classification problems.

1, Training set and test set

Use the data set fashion MNIST obtained in the previous section.

2, Steps

1. Import and storage

import torch
import torchvision
import numpy as np
import sys
sys.path.append("..") # In order to import d2lzh from the upper directory_ pytorch
from d2lzh_pytorch import *
import d2lzh_pytorch as d2l

2. Read data

batch_size =256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

d2l.load_data_fashion_mnist(batch_size)

This function is equivalent to integrating the tasks done in the previous lecture into one function.
This function has been saved in d2lzh package

def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST')
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))	#Because resize=None, this step will not be performed
    trans.append(torchvision.transforms.ToTensor())	#Convert to tensor form
    
    transform = torchvision.transforms.Compose(trans)	#Read image
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)	#In the previous lecture, it was said to download training set data
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)	#In the previous lecture, it was said to download the test set data
    
    if sys.platform.startswith('win'):
        num_workers = 0  # 0 means no additional processes are needed to speed up reading data
    else:
        num_workers = 4	#Set 4 processes to read data
    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    return train_iter, test_iter  

The following is an explanation of the second if statement:
We will train the model on the training data set and evaluate the performance of the trained model on the test data set. As mentioned earlier, mnist_train is torch utils. data. A subclass of dataset, so we can pass it into torch utils. data. DataLoader to create a DataLoader instance that reads small batch data samples.
In practice, data reading is often the performance bottleneck of training, especially when the model is simple or the performance of computing hardware is high. A convenient feature of PyTorch's DataLoader is to allow multiple processes to speed up data reading. Here we pass the parameter num_workers to set up four processes to read data.

3. Initialize model parameters

num_inputs =784
num_outputs = 10

W = torch.tensor(np.random.normal(0, 0.01, (num_inputs,num_outputs)),
                 dtype=torch.float)
b = torch.zeros(num_outputs, dtype=torch.float)

W.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

4. Define model

def softmax(X):
    X_exp = X.exp()
    partition = X_exp.sum(dim=1, keepdim=True)
    return X_exp / partition

def net(X):
    return softmax(torch.mm(X.view((-1, num_inputs)),W) + b)

X.exp()

Returns the X power of e

X_exp.sum(dim=1, keepdim=True)

torch.sum() sums a dimension of the input tensor data in two ways
Sum the elements of the same column (dim=0) or the same row (dim=1), and keep the two dimensions of row and column in the result (keepdim=True).

. mm() matrix multiplication view() redefines the shape of the matrix

5. Define loss function

y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y = torch.LongTensor([0, 2])
y_hat.gather(1, y.view(-1, 1))

def cross_entropy(y_hat,y):
    return - torch.log(y_hat.gather(1, y.view(-1,1)))

.LongTensor()

Convert to longsensor type

torch.gather(input, dim, index, out=None) → Tensor

The size of the returned tensor is consistent with that of the index.
dim is used to indicate the dimension represented by the element value of index. This function can be used to easily extract elements at a specified location.

6. Calculate classification accuracy

def accuracy(y_hat,y):
    return (y_hat.argmax(dim=1) == y).float().mean().item()

print(accuracy(y_hat, y))

.argmax(dim=1)

Returns the maximum number of indexes

.item()

python import torch x = torch.randn(2, 2) 
print(x) 
print(x[1,1])
print(x[1,1].item())

tensor([[ 0.4702,  0.5145],
        [-0.0682, -1.4450]]) 
tensor(-1.4450)
-1.445029854774475 

It can be seen that the difference is the display accuracy. item() returns a floating-point data, so we generally use item() instead of directly taking its corresponding element x[1,1] when calculating loss or accuracy.

# This function has been saved in d2lzh_pytorch package for later use. This function will be improved step by step: its complete implementation will be described in the "image augmentation" section
def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n

7. Training model

num_epochs, lr = 4, 0.1
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)

train_ch3() saved in d2lzh package

def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()
            
            # Gradient clearing
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()
                    
            l.backward()
            if optimizer is None:
                sgd(params, lr, batch_size)
            else:
                optimizer.step()  # The section "concise implementation of softmax regression" will be used
                
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc)) 
         

train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)

net model
train_iter training data
test_iter test data
loss value
num_epochs training cycles
batch_size batch size
params=[W, b] model parameters
lr step size
optimizer=None

8. Forecast

X ,y = iter(test_iter).next()

true_labels = d2l.get_fashion_mnist_labels(y.numpy())
pred_labels = d2l.get_fashion_mnist_labels(net(X).argmax(dim=1).numpy())
titles = [true +'\n' + pred for true, pred in zip(true_labels, pred_labels)]

d2l.show_fashion_mnist(X[0:9], titles[0:9])

summary

Learning notes from scratch of 3.6 softmax regression in hands on deep learning + PyTorch

Keywords: Pytorch Deep Learning logistic regressive

Added by petitduc on Thu, 20 Jan 2022 21:53:13 +0200