Introduction to pytorch

Introduction to pytorch

The article is only used to record your own learning. If there is infringement, delete it immediately

Some good websites

https://github.com/jcjohnson/pytorch-examples Some small projects of pytorch

https://pytorch.org/tutorials/ pytorch official tutorial

Fastai

Michael Nielsen's "Neural Network and Deep Learning" is some derivation of mathematical principles, but it is not necessary to learn pytorch

English online: http://neuralnetworksanddeeplearning.com/about.html

Chinese version:

Convex optimization by Stephen Boyd and Lieven Vandenberghe

3lue1brown video explains knowledge about mathematics and so on https://space.bilibili.com/88461692

pytorch official tutorial

Here are some lessons from pytorch's official tutorials

https://pytorch.org/tutorials/ See the original text below is some of my own translation and learning

LEARN THE BASICS

Authors: Suraj Subramanian, Seth Juarez, Cassie Breviu, Dmitry Soshnikov, Ari Bornstein

Most machine learning workflows involve processing data, creating models, optimizing model parameters, and saving trained models. Next, let's follow the tutorial to learn about pytorch, and then experiment on the minst dataset

There are two ways to learn. One is to run the sample code in the notebook on the cloud

Each section has a "Run in Microsoft Learn" link at the top

The other is to deploy the environment locally

QUICKSTART

PyTorch has two statements for processing data
torch.utils.data.DataLoader and torch.utils.data.Dataset
The Dataset stores samples and their corresponding labels

DataLoader encapsulates the dataset as an iterable (iterator interface)

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt

PyTorch provides domain specific libraries such as TorchText, TorchVision, and TorchAudio, all of which contain data sets. In this tutorial, we will use the TorchVision dataset.

torchvision.datasets module contains many datasets, such as coco and cifra

Each TorchVision dataset contains two parameters: transform and target_transform, which is used to modify samples and labels respectively.

example

# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

We pass the dataset as a parameter to the DataLoader. This wraps an iterable on our dataset and supports automatic batch processing, sampling, mixing and multi process data loading. Here, we define 64 batch sizes, that is, each element in the dataloader iterable will return a batch of 64 features and labels.

batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break

Note: from the above, we can learn the logic of pytorch reading the data set, and understand the logic of pytorch processing the data set (that is, it becomes an iterable). Next, consider how to create a model

Creating Models

In order to define the neural network in PyTorch, we created a class inherited from nn.Module. We are_ init__ Function, and specify how the data will pass through the network in the forward function. To speed up the operation in the neural network, we move it to the GPU (if available).

# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

The interpretation of this network structure will be left behind

Optimizing the Model Parameters

After the network structure is determined, we need to consider how to train the network parameters we need. Therefore, we need to define loss function and optimizer

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

First sentence: cross entropy loss

Second sentence SGD optimizer

In a single training cycle, the model predicts the training data set (batch input), and back propagates the prediction error to adjust the model parameters.

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)# The output is the predicted output
        loss = loss_fn(pred, y)# Calculate loss

        # Backpropagation Algorithm
        optimizer.zero_grad()# Gradient zero
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

Take a closer look at the specific code, cooperate with the previous model, and explain it in detail later in combination with the comments written

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is carried out in multiple iterations (epoch). In each epoch, the model learns parameters to make better predictions. We print the accuracy and loss of the model in each epoch; We want to see the accuracy decrease with the increase of each epoch.

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

saving Models and loading

torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Save model operation

model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))

This concludes the first chapter on reading models. The next chapter introduces tensors

Tensors

This chapter mainly introduces the concept of tensor

Tensors are a specialized data structure that are very similar to arrays and matrices

Tensor is a special data structure, which is very similar to array and matrix. It can be understood as the storage structure in pytorch. We use tensors to encode the input and output of the model and the parameters of the model.

Tensors are similar to NumPy's ndarray, except that they can run on GPU or other hardware accelerators. Tensor and NumPy arrays can usually share the same underlying memory without copying data

import torch
import numpy as np

Initializing a Tensor / creating a tensor

Import tensor from data (matrix form)

data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

Import tensor from numpy array

np_array = np.array(data)
x_np = torch.from_numpy(np_array)

Import from another tensor (unless explicitly overridden, the new tensor retains the properties (shape, data type) of the parameter tensor)

x_ones = torch.ones_like(x_data) 
# Retain the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) 
# overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Out:

Ones Tensor:
 tensor([[1, 1],
        [1, 1]])

Random Tensor:
 tensor([[0.3277, 0.7579],
        [0.1860, 0.8509]])

Define the size shape of the tensor and fill it with constant or random values

shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Out:

Random Tensor:
 tensor([[0.2882, 0.0322, 0.4411],
        [0.5961, 0.6428, 0.1681]])

Ones Tensor:
 tensor([[1., 1., 1.],
        [1., 1., 1.]])

Zeros Tensor:
 tensor([[0., 0., 0.],
        [0., 0., 0.]])

Some properties or interfaces available to tensor

shape dtype device

tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Out:

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu

Operation of tensor

The syntax is similar to that of python or matlab, which is easy to understand

tensor = torch.ones(4, 4)
print('First row: ', tensor[0])
print('First column: ', tensor[:, 0])
print('Last column:', tensor[..., -1])
tensor[:,1] = 0
print(tensor)

Out:

First row:  tensor([1., 1., 1., 1.])
First column:  tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

Connection operation

t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

Out:

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])

arithmetic operation

# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
#tensor matrix multiplication, y1, y2, y3 are all matrix multiplication results
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(tensor)
torch.matmul(tensor, tensor.T, out=y3)


# This computes the element-wise product. z1, z2, z3 will have the same value
# Tensor point multiplication tensor element by element product each other
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

Single element tensor: if you have a single element tensor, for example, by aggregating all the values of the tensor into one value, you can convert it to Python values using item()

agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item))

Out:

12.0 <class 'float'>

In place operations operations that store results in operands are called in place operations. They are represented by the suffix "". For example:: x.copy_(y), x.t_(), will change x

print(tensor, "\n")
tensor.add_(5)
print(tensor)

Out:

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

tensor([[6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.]])

Detailed explanation of datasets & dataloaders

The code that handles data samples can become messy and difficult to maintain; we ideally want our Dataset code to be separated from our model training code for better readability and modularity. PyTorch provides two data primitives: torch.utils.data.DataLoader and torch.utils.data.Dataset, which allow you to use preloaded datasets and yourself The Dataset stores samples and their corresponding tags, and the DataLoader wraps an iteratable object around the Dataset to easily access the samples.
The PyTorch domain library provides many preloaded datasets (such as FashionMNIST), which are subclasses of torch.utils.data.Dataset and implement data specific functions. They can be used for prototyping and benchmarking your model.

Relevant parameters of Load Dataset FashionMNIST Dataset take as an example

  • root is the path where the train/test data is stored, in short, the address, the location where the data set is placed or to be downloaded

  • Train specific training or test dataset

  • download=True downloads the data from the internet if it's not available at root

  • transform and target_transform specify the feature and label transformations specify what transform to use

import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt


training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

Visual dataset

We can manually index the dataset like a list: training_data[index]. We use matplotlib to visualize some samples in the training data.

labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(training_data), size=(1,)).item()
    img, label = training_data[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(labels_map[label])
    plt.axis("off")
    plt.imshow(img.squeeze(), cmap="gray")
plt.show()

Create a custom dataset

A custom dataset must contain:

: __init__, __len__, and __getitem__.

Three functions

For example, take the FashionMNIST dataset as an example

Images are stored in the directory img_ In dir, their labels are stored separately in the CSV file annotations_file.

import os
import pandas as pd
from torchvision.io import read_image

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

__ init__ function
__ init__ The function runs once when instantiating the Dataset object. We initialize a directory that contains images, annotation files, and two transformations

labels.csv file, for example

tshirt1.jpg, 0
tshirt2.jpg, 0
......
ankleboot999.jpg, 9
def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
    self.img_labels = pd.read_csv(annotations_file, names=['file_name', 'label'])
    self.img_dir = img_dir
    self.transform = transform
    self.target_transform = target_transform

__ len__ Function returns the number of samples in our dataset.

def __len__(self):
    return len(self.img_labels)

__ getitem__ Function loads and returns a sample from the dataset of the given index idx. Based on the index, it identifies the position of the image on the disk and uses read_image converts it to a tensor from self.img_ Retrieve the corresponding tags from the csv data in labels, call their conversion functions (if applicable), and return the corresponding tags in tensor images and tuples.

def __getitem__(self, idx):
    img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
    image = read_image(img_path)
    label = self.img_labels.iloc[idx, 1]
    if self.transform:
        image = self.transform(image)
    if self.target_transform:
        label = self.target_transform(label)
    return image, label

Using data set to transfer data for training

The dataset retrieves the features of our dataset and marks one sample at a time. When training models, we usually want to transfer samples in the form of "small batch", reshuffle the data in each period to reduce over fitting of the model, and use Python's multiprocessing to speed up data retrieval.
DataLoader is an iterator that abstracts this complexity for us in a simple API.

from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

Traverse DataLoader

We have loaded the dataset into the DataLoader and can traverse the dataset as needed. Each iteration below will return a batch of trains_ Features and train_labels (including batch_size=64 features and labels respectively). Because we specify shuffle=True, the data will be disrupted after we traverse all batches

# Display image and label.
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {label}")![../../_images/sphx_glr_data_tutorial_002.png](https://pytorch.org/tutorials/_images/sphx_glr_data_tutorial_002.png)

Out:

Feature batch shape: torch.Size([64, 1, 28, 28])
Labels batch shape: torch.Size([64])
Label: 5

Introduction to TRANSFORMS

In short, transform is to operate data to make it suitable for our network operation, that is, the data does not always appear in the final processing form required for training machine learning algorithms. We use transformation to manipulate the data and make it suitable for training.

All TorchVision datasets have two parameters - transform is used to modify the feature, target_ Transforms are used to modify tags -- they accept callable objects that contain transformation logic. The torchvision.transforms module provides several common transformations out of the box.

Fortunately, pytorch integrates a good transform function, so we don't have to bother writing new functions

import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
    target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)

ToTensor()

ToTensor() converts a PIL image or NumPy ndarray to FloatTensor. And the pixel intensity value of the image is scaled within the range of [0,1.]

Build the network

Neural networks consist of layers / modules that perform operations on data. The torch.nn namespace provides all the building blocks needed to build your own neural network. Each module in PyTorch is a subclass of nn.Module. Neural network is a module itself, which is composed of other modules (layers). This nested structure allows you to easily build and manage complex architectures.

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

Select the device cpu or gpu to use

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using {device} device')

Define class

We define our neural network by inheriting nn.Module and initialize the neural network layer in init. Each NN. Module subclass implements the operation on the input data in the forward method.

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
model = NeuralNetwork().to(device)
print(model)

Network structure: linear layer + relu layer + linear layer + relu layer to linear layer

Out:

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

In order to use the model, we pass the input data to it. This will perform forwarding of the model, as well as some background operations. Do not directly call model.forward()!
Calling the model on the input returns a 10 dimensional tensor containing the original predicted value of each class. We obtain the prediction probability by passing it to the instance of nn.Softmax module.

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

The role of each module can be seen in combination with the 3bulue1brown MLP principle written in the previous article

Take a small batch sample consisting of three images with a size of 28x28 and see what happens when we transfer it over the network.

input_image = torch.rand(3,28,28)
print(input_image.size())

Out:

torch.Size([3, 28, 28])

nn.Flatten

We initialize the [nn.Flatten] layer to convert each 2D 28x28 image into a continuous array of 784 pixel values (the small batch dimension (when dim=0) is maintained).

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

Out:

torch.Size([3, 784])

nn.Linear

nn.Linear is a module that uses its stored weights and deviations to apply a linear transformation to the input.

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

Out:

torch.Size([3, 20])

nn.ReLU

nn.ReLU nonlinear activation is the reason for creating a complex mapping between the input and output of the model. They are applied after linear transformation to introduce nonlinearity and help neural networks learn a variety of phenomena.
In this model, we use [nn.ReLU] between our linear layers, but there are other activations to introduce nonlinearity into your model.

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Out:

Before ReLU: tensor([[-0.5059,  0.0748, -0.3764,  0.2702, -0.5308,  0.1879, -0.1396, -0.1809,
         -0.0651,  0.1935,  0.8745,  0.3594, -0.0366,  0.4182, -0.4431,  0.2117,
         -0.2114, -0.0045, -0.2030, -0.5195],
        [-0.6142, -0.0293,  0.1397,  0.2526, -0.2365,  0.2625, -0.2154, -0.1611,
         -0.0842, -0.0181,  0.8274,  0.0739,  0.2244,  0.3389, -0.2915,  0.0280,
          0.0083, -0.3871, -0.3059, -0.0009],
        [-0.4356,  0.0336,  0.0208,  0.5318, -0.4322,  0.2168,  0.1233, -0.2511,
         -0.0217, -0.0147,  0.6722,  0.0350,  0.0696,  0.6650, -0.3241,  0.0591,
         -0.1873,  0.0044, -0.1151, -0.3288]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.0000, 0.0748, 0.0000, 0.2702, 0.0000, 0.1879, 0.0000, 0.0000, 0.0000,
         0.1935, 0.8745, 0.3594, 0.0000, 0.4182, 0.0000, 0.2117, 0.0000, 0.0000,
         0.0000, 0.0000],
        [0.0000, 0.0000, 0.1397, 0.2526, 0.0000, 0.2625, 0.0000, 0.0000, 0.0000,
         0.0000, 0.8274, 0.0739, 0.2244, 0.3389, 0.0000, 0.0280, 0.0083, 0.0000,
         0.0000, 0.0000],
        [0.0000, 0.0336, 0.0208, 0.5318, 0.0000, 0.2168, 0.1233, 0.0000, 0.0000,
         0.0000, 0.6722, 0.0350, 0.0696, 0.6650, 0.0000, 0.0591, 0.0000, 0.0044,
         0.0000, 0.0000]], grad_fn=<ReluBackward0>)

nn.Sequential

[nn.Sequential] is an ordered module container. Data passes through all modules in the same order as defined. You can use sequential containers to group fast networks together, such as "seq_modules".

seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

nn.Softmax

The last linear layer of the neural network returns the original value in logits - [-infty, infty] - and passes it to the [nn.Softmax] module. logits are scaled to values [0, 1], representing the prediction probability of the model for each category. The dim parameter indicates dimensions whose values must sum to 1.

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

model parameter

Many layers in the neural network are parameterized, that is, they have relevant weights and deviations optimized during training. The subclass nn.Module will automatically track all fields defined in the model object and use the parameters() or named of the model_ The parameters() method makes all parameters accessible.
We iterate over each parameter and print its size and its value

print("Model structure: ", model, "\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Out:

Model structure:  NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0088,  0.0077,  0.0342,  ..., -0.0352, -0.0216,  0.0057],
        [ 0.0218,  0.0252, -0.0121,  ..., -0.0119, -0.0242, -0.0097]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0026, -0.0315], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0361, -0.0067, -0.0310,  ...,  0.0283,  0.0308,  0.0301],
        [ 0.0113, -0.0117,  0.0122,  ..., -0.0277,  0.0144, -0.0243]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([-0.0320,  0.0118], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[ 0.0066, -0.0132, -0.0337,  ...,  0.0186, -0.0261, -0.0128],
        [ 0.0329,  0.0164,  0.0112,  ...,  0.0183, -0.0094,  0.0095]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([0.0200, 0.0429], device='cuda:0', grad_fn=<SliceBackward0>)

Automatic differentiation using TORCH.AUTOGRAD

When training neural networks, the most commonly used algorithm is back propagation. In this algorithm, the parameters (model weight) are adjusted according to the gradient of the loss function relative to the given parameters.
To calculate these gradients, PyTorch has a built-in differentiation engine called torch.autograd. It supports automatic gradient calculation of any calculation graph.
Consider the simplest layer of neural network, input x, parameters w and b, and some loss functions. It can be defined in PyTorch in the following ways:

import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

Look at the code carefully to find the cross entropy

Tensor, function and computational graph

This code defines the following computational graph:

In short, if you have studied the principle of signal and system or automatic control, it is very similar to the signal flow diagram inside

In this network, w and b are the parameters we need to optimize. Therefore, we need to be able to calculate the gradient of the loss function with respect to these variables. To do this, we set the requirements of these tensors_ Grad attribute.

The functions we apply to tensors to construct computational graphs are actually Function like objects. This object knows how to calculate the Function in the forward direction and how to calculate its derivative in the back propagation step. The reference to the back propagation Function is stored in grad of the tensor_ FN attribute.

print('Gradient function for z =', z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

Out:

Gradient function for z = <AddBackward0 object at 0x7f1127a750b8>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7f1127a750b8>

Calculating Derivatives

In order to optimize the weight of parameters in neural network, we need to calculate the derivative of loss function to parameters, that is, we need
∂ l o s s ∂ w \frac{\partial loss}{\partial w} ∂w∂loss​
and
∂ l o s s ∂ b \frac{\partial loss}{\partial b} ∂b∂loss​
Under some fixed x and y values. To calculate these derivatives, we call loss.backward(), and then use w.grad and b.grad to see the calculated values

loss.backward()
print(w.grad)
print(b.grad)

Out:

tensor([[0.0093, 0.0089, 0.1828],
        [0.0093, 0.0089, 0.1828],
        [0.0093, 0.0089, 0.1828],
        [0.0093, 0.0089, 0.1828],
        [0.0093, 0.0089, 0.1828]])
tensor([0.0093, 0.0089, 0.1828])

be careful

  • We can only get the grad attribute of the leaf nodes of the calculation graph and the requirements of these nodes_ Set the grad property to True. For all other nodes in our graph, the gradient will not be available.

  • For performance reasons, we can only perform gradient calculations once using "backward" on a given graph. If we need to make multiple backward calls on the same graph, we need to set retain_graph=True is passed to the backward call.

    Some details

    Conceptually, autograd stores data (tensors) and record objects of all performed operations (and the resulting new tensors) in a directed acyclic graph (DAG) composed of [Function]. In this DAG, the leaf is the input tensor and the root is the output tensor. By tracing this graph from root to leaf, you can automatically calculate the gradient using the chain rule

    In the forward pass, autograd does two things at the same time:

    • Run the requested operation to calculate the result tensor
    • Maintain the gradient function of the operation in DAG.

    When. backward() is called on the DAG root, the reverse pass begins. autograd then:

    • Calculate each. Grad_ The gradient of FN,

      • Accumulate them in the. grad attribute of the corresponding tensor-
    • Using the chain rule, it propagates all the way to the leaf tensor.

OPTIMIZING MODEL PARAMETERS optimizer

Now that we have models and data, it's time to train, validate and test our models by optimizing data parameters. The training model is an iterative process; In each iteration (called epoch), the model guesses the output, calculates the error (loss) in its guess, collects the derivative of the error to its parameters (as we saw in the previous section), and optimizes these parameters using gradient descent.

Keywords: Machine Learning Computer Vision Deep Learning

Added by bosco500 on Fri, 26 Nov 2021 15:54:41 +0200