Second job: multi-layer perceptron

1, Linear neural network

(1) Linear regression

1. Linear model

The linear model is regarded as a single-layer neural network.

2. Loss function

The loss function can quantify the difference between the actual value and the predicted value of the target.

  3. Analytical solution

 

  4. Optimization method: small batch gradient descent algorithm

For the case without analytical solution, the gradient descent reduces the error by continuously updating the parameters in the decreasing direction of the loss function. Calculate the derivative (gradient) of the loss function with respect to the model parameters. But in practice, the execution may be very slow: because we have to traverse the entire data set before each parameter update. Therefore, we usually take a small batch of samples at random every time we need to calculate the update. This variant is called small batch random gradient descent, and the batch size is b.

  (2) Implementation of linear regression

1. Generate dataset

def synthetic_data(w, b, num_examples):  #@save
    """generate y = Xw + b + Noise."""
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)

2. Read dataset

Disrupt samples in the data set and obtain data in small batches. We define a data_iter function, which receives batch size, characteristic matrix and label vector as input, and generates batch size_ Small batch size. Each small batch contains a set of features and labels.

def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    # These samples are read randomly without a specific order
    random.shuffle(indices)
    for i in range(0, num_examples, batch_size):
        batch_indices = torch.tensor(
            indices[i: min(i + batch_size, num_examples)])
        yield features[batch_indices], labels[batch_indices]

3. Initialize model parameters

The weight is initialized by sampling random numbers from the normal distribution with mean value of 0 and standard deviation of 0.01, and the offset is initialized to 0.

w = torch.normal(0, 0.01, size=(2,1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

4. Define model

def linreg(X, w, b):  #@save
    """Linear regression model."""
    return torch.matmul(X, w) + b

5. Define loss function

Here we use the square loss function. In the implementation, we need to convert the shape of the real value y into and the predicted value y_hat has the same shape.

def squared_loss(y_hat, y):  #@save
    """Mean square loss."""
    return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2

6. Define optimization algorithm

This function accepts the set of model parameters, learning rate and batch size as inputs. The size of each update step is determined by the learning rate lr. Because the loss we calculate is the sum of a batch of samples, we use batch_size to normalize the step size, so that the step size does not depend on our choice of batch size.

def sgd(params, lr, batch_size):  #@save
    """Small batch random gradient descent."""
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

7. Training

Number of iteration cycles num_epochs and learning rate lr are both super parameters. Setting super parameters is very difficult and needs to be adjusted through repeated experiments.  

lr = 0.03
num_epochs = 3
net = linreg
loss = squared_loss

for epoch in range(num_epochs):
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y)  # `Small batch loss of X 'and' y '
        # Because the ` L 'shape is (` batch_size`, 1), not a scalar` All elements in l ` are added together,
        # And calculate the gradient about [` w`, `b `]
        l.sum().backward()
        sgd([w, b], lr, batch_size)  # Update the parameter with the gradient of the parameter
    with torch.no_grad():
        train_l = loss(net(features, w, b), labels)
        print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')

(3) Simple implementation of linear regression

A deep learning framework is used to concisely implement the linear regression model in (2).

1. Generate dataset

import numpy as np
import torch
from torch.utils import data
from d2l import torch as d2l


true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)

2. Read dataset

We can call the existing API in the framework to read the data. We pass features and labels as API parameters, and specify batch when instantiating the data iterator object_ size. In addition, the Boolean value is_train indicates whether you want the data iterator object to scramble the data in each iteration cycle.

def load_array(data_arrays, batch_size, is_train=True):  #@save
    """Construct a PyTorch Data iterator."""
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)

batch_size = 10
data_iter = load_array((features, labels), batch_size)

3. Define model

In PyTorch, the full connection layer is defined in the Linear class. It is worth noting that we pass two parameters to nn.Linear. The first specifies the input feature shape, i.e. 2, and the second specifies the output feature shape. The output feature shape is a single scalar, so it is 1.

# `nn ` is the abbreviation of neural network
from torch import nn

net = nn.Sequential(nn.Linear(2, 1))

4. Initialize model parameters

Just as we specify the input and output dimensions when constructing nn.Linear. Now let's access the parameters directly to set the initial value. We select the first layer in the network through net[0], and then use the weight.data and bias.data methods to access the parameters. Then use the replacement method normal_ And fill_ To override parameter values. Here, we specify that each weight parameter should be randomly sampled from a normal distribution with a mean of 0 and a standard deviation of 0.01, and the bias parameter will be initialized to zero.

net[0].weight.data.normal_(0, 0.01)
net[0].bias.data.fill_(0)

5. Define loss function

The mselos class, also known as the square L2 norm, is used to calculate the mean square error. By default, it returns the average of all sample losses.

loss = nn.MSELoss()

6. Define optimization algorithm

The small batch random gradient descent algorithm is a standard tool for optimizing neural networks. PyTorch implements many variants of the algorithm in optim module. When we instantiate the SGD instance, we need to specify the optimized parameters (which can be obtained from our model through net.parameters()) and the super parameter dictionary required by the optimization algorithm. Small batch random gradient descent only needs to set lr value, which is set to 0.03 here.

trainer = torch.optim.SGD(net.parameters(), lr=0.03)

7. Training

In each iteration cycle, we will completely traverse the train_data and constantly obtain a small batch of input and corresponding labels. For each small batch, we will carry out the following steps:

  • The prediction is generated by calling net(X) and the loss l (forward propagation) is calculated.

  • The gradient is calculated by back propagation.

  • Update model parameters by calling the optimizer.

  • num_epochs = 3
    for epoch in range(num_epochs):
        for X, y in data_iter:
            l = loss(net(X) ,y)
            trainer.zero_grad()
            l.backward()
            trainer.step()
        l = loss(net(features), labels)
        print(f'epoch {epoch + 1}, loss {l:f}')

(4) Softmax regression

  Common loss function:

(1)L2 Loss                   (2)L1 Loss                        (3)Huber's Robust Loss

(5) Image classification dataset

%matplotlib inline
import torch
import torchvision
from torch.utils import data
from torchvision import transforms
from d2l import torch as d2l

d2l.use_svg_display()

1. Read dataset

The fashion MNIST dataset can be downloaded and read into memory through the built-in functions in the framework.

# The image data is transformed from PIL type to 32-bit floating-point format through ToTensor instance
# Divide by 255 so that the values of all pixels are between 0 and 1
trans = transforms.ToTensor()
mnist_train = torchvision.datasets.FashionMNIST(
    root="../data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(
    root="../data", train=False, transform=trans, download=True)

Fashion MNIST contains 10 categories, each of which includes 6000 training data and 1000 test data. The following function is used to convert between a numeric label index and its text name.

def get_fashion_mnist_labels(labels):  #@save
    """return Fashion-MNIST The text label of the dataset."""
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]

2. Read small batch

Using the built-in data iterator, the data loader will read a small batch of data each time in each iteration_ size. We also randomly scrambled all samples in the training data iterator.

batch_size = 256

def get_dataloader_workers():  #@save
    """Four processes are used to read data."""
    return 4

train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True,
                             num_workers=get_dataloader_workers())

Take a look at the time it takes to read the training data.

timer = d2l.Timer()
for X, y in train_iter:
    continue
f'{timer.stop():.2f} sec'

3. Consolidate all components

We define load_data_fashion_mnist function, which is used to obtain and read the fashion MNIST dataset. It returns the data iterators for the training set and the validation set. In addition, it accepts an optional parameter resize to resize the image to another shape.

def load_data_fashion_mnist(batch_size, resize=None):  #@save
    """download Fashion-MNIST The dataset and then load it into memory."""
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(
        root="../data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(
        root="../data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True,
                            num_workers=get_dataloader_workers()),
            data.DataLoader(mnist_test, batch_size, shuffle=False,
                            num_workers=get_dataloader_workers()))

Test load by specifying the resize parameter_ data_ fashion_ Image resizing function of MNIST function.

train_iter, test_iter = load_data_fashion_mnist(32, resize=64)
for X, y in train_iter:
    print(X.shape, X.dtype, y.shape, y.dtype)
    break
torch.Size([32, 1, 64, 64]) torch.float32 torch.Size([32]) torch.int64

(6) Implementation of Softmax regression

import torch
from IPython import display
from d2l import torch as d2l


batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

1. Initialize model parameters

Each sample in the original dataset is 28 × two thousand eight hundred and twenty-eight × 28. In this section, we will flatten each image as a vector of length 784. In softmax regression, we have as much output as categories. Because our dataset has 10 categories, the network output dimension is 10. As with linear regression, we will initialize our weight W with a normal distribution and an offset of 0.

num_inputs = 784
num_outputs = 10

W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
b = torch.zeros(num_outputs, requires_grad=True)

2. Define softmax operations

softmax consists of three steps: (1) exponentiation of each term (using exp); (2) Sum each row (each sample is a row in a small batch) to obtain the normalization constant of each sample; (3) Divide each row by its normalization constant to ensure that the sum of the results is 1.

def softmax(X):
    X_exp = torch.exp(X)
    partition = X_exp.sum(1, keepdim=True)
    return X_exp / partition  # The broadcast mechanism is applied here

3. Define model

The following code defines how the input is mapped to the output through the network. Note that before passing the data to our model, we use the reshape function to flatten each original image into a vector.

def net(X):
    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)

4. Define loss function

5. Classification accuracy

6. Training

7. Forecast

(7) Concise implementation of Softmax regression

2, Multilayer perceptron

(1) Multilayer perceptron

  (2) Implementation of multi-layer perceptron

1. Initialize model parameters

num_inputs, num_outputs, num_hiddens = 784, 10, 256

W1 = nn.Parameter(torch.randn(
    num_inputs, num_hiddens, requires_grad=True) * 0.01)
b1 = nn.Parameter(torch.zeros(num_hiddens, requires_grad=True))
W2 = nn.Parameter(torch.randn(
    num_hiddens, num_outputs, requires_grad=True) * 0.01)
b2 = nn.Parameter(torch.zeros(num_outputs, requires_grad=True))

params = [W1, b1, W2, b2]

2. Activation function

def relu(X):
    a = torch.zeros_like(X)
    return torch.max(X, a)

3. Model

We use reshape to convert each two-dimensional image into a length of num_ Vector of inputs.

def net(X):
    X = X.reshape((-1, num_inputs))
    H = relu(X@W1 + b1)  # Here "@" represents matrix multiplication
    return (H@W2 + b2)

4. Loss function

Directly use the built-in functions in the advanced API to calculate softmax and cross entropy loss.

loss = nn.CrossEntropyLoss()

5. Training

The training process of multilayer perceptron is exactly the same as that of softmax regression. You can call the train of d2l package directly_ CH3 function.

num_epochs, lr = 10, 0.1
updater = torch.optim.SGD(params, lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)

  (3) Simple implementation of multi-layer perceptron

import torch
from torch import nn
from d2l import torch as d2l

Compared with the concise implementation of softmax regression, the only difference is that we have added two full connection layers (we only added one full connection layer before). The first layer is the hidden layer, which contains 256 hidden units and uses the ReLU activation function. The second layer is the output layer.

net = nn.Sequential(nn.Flatten(),
                    nn.Linear(784, 256),
                    nn.ReLU(),
                    nn.Linear(256, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights);
batch_size, lr, num_epochs = 256, 0.1, 10
loss = nn.CrossEntropyLoss()
trainer = torch.optim.SGD(net.parameters(), lr=lr)

train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

(4) Model selection, under fitting and over fitting

Training error: the error calculated by the model on the training data set.

Generalization error: the error calculated by the model on the new data set.

Validation dataset: a dataset used to evaluate the quality of the model.

Test data set: a data set used only once.

k-fold cross validation method:

  Overfitting:

Under fit:

(5) Weight attenuation

 

(6) Dropout  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Keywords: Machine Learning neural networks Deep Learning

Added by cyberlew15 on Mon, 13 Sep 2021 04:38:22 +0300