preface
Softmax regression, also known as multiple or multi class Logistic regression, is the generalization of Logistic regression in multi classification problems.
1, Training set and test set
Use the data set fashion MNIST obtained in the previous section.
2, Steps
1. Import and storage
import torch import torchvision import numpy as np import sys sys.path.append("..") # In order to import d2lzh from the upper directory_ pytorch from d2lzh_pytorch import * import d2lzh_pytorch as d2l
2. Read data
batch_size =256 train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.load_data_fashion_mnist(batch_size)
This function is equivalent to integrating the tasks done in the previous lecture into one function.
This function has been saved in d2lzh package
def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST') trans = [] if resize: trans.append(torchvision.transforms.Resize(size=resize)) #Because resize=None, this step will not be performed trans.append(torchvision.transforms.ToTensor()) #Convert to tensor form transform = torchvision.transforms.Compose(trans) #Read image mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform) #In the previous lecture, it was said to download training set data mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform) #In the previous lecture, it was said to download the test set data if sys.platform.startswith('win'): num_workers = 0 # 0 means no additional processes are needed to speed up reading data else: num_workers = 4 #Set 4 processes to read data train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers) test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers) return train_iter, test_iter
The following is an explanation of the second if statement:
We will train the model on the training data set and evaluate the performance of the trained model on the test data set. As mentioned earlier, mnist_train is torch utils. data. A subclass of dataset, so we can pass it into torch utils. data. DataLoader to create a DataLoader instance that reads small batch data samples.
In practice, data reading is often the performance bottleneck of training, especially when the model is simple or the performance of computing hardware is high. A convenient feature of PyTorch's DataLoader is to allow multiple processes to speed up data reading. Here we pass the parameter num_workers to set up four processes to read data.
3. Initialize model parameters
num_inputs =784 num_outputs = 10 W = torch.tensor(np.random.normal(0, 0.01, (num_inputs,num_outputs)), dtype=torch.float) b = torch.zeros(num_outputs, dtype=torch.float) W.requires_grad_(requires_grad=True) b.requires_grad_(requires_grad=True)
4. Define model
def softmax(X): X_exp = X.exp() partition = X_exp.sum(dim=1, keepdim=True) return X_exp / partition def net(X): return softmax(torch.mm(X.view((-1, num_inputs)),W) + b)
X.exp()
Returns the X power of e
X_exp.sum(dim=1, keepdim=True)
torch.sum() sums a dimension of the input tensor data in two ways
Sum the elements of the same column (dim=0) or the same row (dim=1), and keep the two dimensions of row and column in the result (keepdim=True).
. mm() matrix multiplication view() redefines the shape of the matrix
5. Define loss function
y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]]) y = torch.LongTensor([0, 2]) y_hat.gather(1, y.view(-1, 1)) def cross_entropy(y_hat,y): return - torch.log(y_hat.gather(1, y.view(-1,1)))
.LongTensor()
Convert to longsensor type
torch.gather(input, dim, index, out=None) → Tensor
The size of the returned tensor is consistent with that of the index.
dim is used to indicate the dimension represented by the element value of index. This function can be used to easily extract elements at a specified location.
6. Calculate classification accuracy
def accuracy(y_hat,y): return (y_hat.argmax(dim=1) == y).float().mean().item() print(accuracy(y_hat, y))
.argmax(dim=1)
Returns the maximum number of indexes
.item()
python import torch x = torch.randn(2, 2) print(x) print(x[1,1]) print(x[1,1].item()) tensor([[ 0.4702, 0.5145], [-0.0682, -1.4450]]) tensor(-1.4450) -1.445029854774475
It can be seen that the difference is the display accuracy. item() returns a floating-point data, so we generally use item() instead of directly taking its corresponding element x[1,1] when calculating loss or accuracy.
# This function has been saved in d2lzh_pytorch package for later use. This function will be improved step by step: its complete implementation will be described in the "image augmentation" section def evaluate_accuracy(data_iter, net): acc_sum, n = 0.0, 0 for X, y in data_iter: acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() n += y.shape[0] return acc_sum / n
7. Training model
num_epochs, lr = 4, 0.1 train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)
train_ch3() saved in d2lzh package
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params=None, lr=None, optimizer=None): for epoch in range(num_epochs): train_l_sum, train_acc_sum, n = 0.0, 0.0, 0 for X, y in train_iter: y_hat = net(X) l = loss(y_hat, y).sum() # Gradient clearing if optimizer is not None: optimizer.zero_grad() elif params is not None and params[0].grad is not None: for param in params: param.grad.data.zero_() l.backward() if optimizer is None: sgd(params, lr, batch_size) else: optimizer.step() # The section "concise implementation of softmax regression" will be used train_l_sum += l.item() train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item() n += y.shape[0] test_acc = evaluate_accuracy(test_iter, net) print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f' % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)
net model
train_iter training data
test_iter test data
loss value
num_epochs training cycles
batch_size batch size
params=[W, b] model parameters
lr step size
optimizer=None
8. Forecast
X ,y = iter(test_iter).next() true_labels = d2l.get_fashion_mnist_labels(y.numpy()) pred_labels = d2l.get_fashion_mnist_labels(net(X).argmax(dim=1).numpy()) titles = [true +'\n' + pred for true, pred in zip(true_labels, pred_labels)] d2l.show_fashion_mnist(X[0:9], titles[0:9])
summary
Learning notes from scratch of 3.6 softmax regression in hands on deep learning + PyTorch