article Pytorch tutorials [pytorch official tutorial in Chinese and English] - 4 Transforms This paper introduces the use of Transforms for data conversion. Next, let's see how to build a model.
Original link: Build the Neural Network — PyTorch Tutorials 1.10.1+cu102 documentation
BUILD THE NEURAL NETWORK
Neural networks comprise of layers/modules that perform operations on data. The torch.nn namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the nn.Module. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.
[neural networks consist of layers / modules that operate on data. The torch.nn namespace provides all the building blocks required to build their own neural networks. Each module in PyTorch is a subclass of nn.Module. Neural networks themselves are modules composed of other modules (layers). This nested structure allows complex architectures to be easily built and managed.]
In the following sections, we'll build a neural network to classify images in the FashionMNIST dataset.
[in the next chapter, we will build a neural network to classify the images in the FashionMNIST dataset.]
import os import torch from torch import nn from torch.utils.data import DataLoader from torchvision import datasets, transforms
1 Get Device for Training
We want to be able to train our model on a hardware accelerator like the GPU, if it is available. Let's check to see if torch.cuda is available, else we continue to use the CPU.
[we hope to train our model on hardware accelerators such as GPU, if it is available. Let's check that torch.cuda is available, otherwise we will continue to use CPU.]
device = 'cuda' if torch.cuda.is_available() else 'cpu' print(f'Using {device} device')
Output results:
Using cuda device
2 Define the Class
We define our neural network by subclassing nn.Module, and initialize the neural network layers in __init__. Every nn.Module subclass implements the operations on input data in the forward method.
[we define our neural network by inheriting the nn.Module class, and initialize the neural network layer in _init_. Each nn.Module subclass implements the operation of input data in the forward method.]
class NeuralNetwork(nn.Module): def __init__(self): super(NeuralNetwork, self).__init__() self.flatten = nn.Flatten() self.linear_relu_stack = nn.Sequential( nn.Linear(28*28, 512), nn.ReLU(), nn.Linear(512, 512), nn.ReLU(), nn.Linear(512, 10), ) def forward(self, x): x = self.flatten(x) logits = self.linear_relu_stack(x) return logits
We create an instance of NeuralNetwork, and move it to the device, and print its structure.
[we create an instance of NeuralNetwork, move it to the device, and then print its structure.]
model = NeuralNetwork().to(device) print(model)
The output is as follows:
NeuralNetwork( (flatten): Flatten(start_dim=1, end_dim=-1) (linear_relu_stack): Sequential( (0): Linear(in_features=784, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=512, bias=True) (3): ReLU() (4): Linear(in_features=512, out_features=10, bias=True) ) )
To use the model, we pass it the input data. This executes the model's forward, along with some background operations. Do not call model.forward() directly!
[in order to use the model, we pass the input data to it. This will perform the forward of the model and some background operations. Do not call model.forward()!]
Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class. We get the prediction probabilities by passing it through an instance of the nn.Softmax module.
[calling the model on the input will return a 10-dimensional tensor, and each class has the original prediction value. We obtain the prediction probability through an instance of nn.Softmax.]
X = torch.rand(1, 28, 28, device=device) logits = model(X) pred_probab = nn.Softmax(dim=1)(logits) y_pred = pred_probab.argmax(1) print(f"Predicted class: {y_pred}")
Output results:
Predicted class: tensor([1], device='cuda:0')
3 Model Layers
Let's break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass it through the network.
[let's decompose the layers in the FashionMNIST model. To illustrate this, we will take a small batch sample of three pictures with a size of 28x28 to see what happens when we pass them through the network.]
input_image = torch.rand(3,28,28) print(input_image.size())
Output:
torch.Size([3, 28, 28])
3.1 nn.Flatten
We initialize the nn.Flatten layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values ( the minibatch dimension (at dim=0) is maintained).
[we initialize the nn.Flatten layer to convert each 2D 28x28 image into a continuous array of 784 pixel values (maintaining a small batch size at dim=0).]
flatten = nn.Flatten() flat_image = flatten(input_image) print(flat_image.size())
Output:
torch.Size([3, 784])
3.2 nn.Linear
The linear layer is a module that applies a linear transformation on the input using its stored weights and biases.
[the linear layer is a module that applies a linear transformation to the input using stored weights and deviations.]
layer1 = nn.Linear(in_features=28*28, out_features=20) hidden1 = layer1(flat_image) print(hidden1.size())
Output:
torch.Size([3, 20])
3.3 nn.ReLU
Non-linear activations are what create the complex mappings between the model's inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.
Nonlinear activation creates a complex mapping between the input and output of the model. After they are applied to linear transformation, nonlinearity is introduced to help neural networks learn a variety of phenomena
In this model, we use nn.ReLU between our linear layers, but there's other activations to introduce non-linearity in your model.
In this model, we use neural network. ReLU between linear layers, but there are other activation to introduce nonlinearity
print(f"Before ReLU: {hidden1}\n\n") hidden1 = nn.ReLU()(hidden1) print(f"After ReLU: {hidden1}")
result:
Before ReLU: tensor([[-0.2541, -0.1397, 0.2342, 0.1364, -0.0437, 0.3759, 0.2808, -0.0619, 0.2780, 0.2830, -0.4725, 0.4298, 0.2717, -0.1618, -0.0604, 0.3242, -0.5874, -0.5922, -0.2481, -0.4181], [-0.1339, -0.1163, 0.1688, 0.1112, 0.1179, 0.3560, 0.0990, -0.1398, 0.2619, -0.1023, -0.7150, -0.1186, 0.3338, -0.0817, 0.1983, -0.2084, -0.3889, -0.2361, -0.0752, -0.2144], [-0.1284, 0.0683, 0.0707, 0.0997, -0.2274, 0.4379, 0.1461, 0.0949, 0.2710, -0.0563, -0.6621, -0.3552, 0.4966, 0.2304, 0.0020, -0.0470, -0.6260, -0.2077, -0.0790, -0.4635]], grad_fn=<AddmmBackward0>) After ReLU: tensor([[0.0000, 0.0000, 0.2342, 0.1364, 0.0000, 0.3759, 0.2808, 0.0000, 0.2780, 0.2830, 0.0000, 0.4298, 0.2717, 0.0000, 0.0000, 0.3242, 0.0000, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.1688, 0.1112, 0.1179, 0.3560, 0.0990, 0.0000, 0.2619, 0.0000, 0.0000, 0.0000, 0.3338, 0.0000, 0.1983, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.0000, 0.0683, 0.0707, 0.0997, 0.0000, 0.4379, 0.1461, 0.0949, 0.2710, 0.0000, 0.0000, 0.0000, 0.4966, 0.2304, 0.0020, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]], grad_fn=<ReluBackward0>)
3.4 nn.Sequential
nn.Sequential is an ordered container of modules. The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like seq_modules.
[nn.Sequential is an ordered container of modules. Data is passed to all modules in a defined order. You can use sequential containers to combine fast networks like seq_modules.]
seq_modules = nn.Sequential( flatten, layer1, nn.ReLU(), nn.Linear(20, 10) ) input_image = torch.rand(3,28,28) logits = seq_modules(input_image)
3.5 nn.Softmax
The last linear layer of the neural network returns logits - raw values in [-infty, infty] - which are passed to the nn.Softmax module. The logits are scaled to values [0, 1] representing the model's predicted probabilities for each class. dim parameter indicates the dimension along which the values must sum to 1.
The last linear layer of the neural network returns the original values of logits in [- infty, infty], which are passed to NN Softmax module. logits is scaled to a value [0,1], which represents the prediction probability of the model for each class. The Dim parameter represents a dimension where the sum of values must be 1.]
softmax = nn.Softmax(dim=1) pred_probab = softmax(logits)
4 Model Parameters
Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model's parameters() or named_parameters() methods.
[many layers inside the neural network are parameterized, that is, relevant weights and deviations are optimized during training. The subclassed nn.Module module automatically tracks all fields defined in the model object and uses the parameters() or named_parameters() methods of the model to access all parameters.]
In this example, we iterate over each parameter, and print its size and a preview of its values.
In this example, we iterate over each parameter and print a preview of its size and its value
print("Model structure: ", model, "\n\n") for name, param in model.named_parameters(): print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")
Output:
Model structure: NeuralNetwork( (flatten): Flatten(start_dim=1, end_dim=-1) (linear_relu_stack): Sequential( (0): Linear(in_features=784, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=512, bias=True) (3): ReLU() (4): Linear(in_features=512, out_features=10, bias=True) ) ) Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0169, 0.0327, -0.0128, ..., -0.0273, 0.0193, -0.0197], [ 0.0309, 0.0003, -0.0232, ..., 0.0284, -0.0163, 0.0171]], device='cuda:0', grad_fn=<SliceBackward0>) Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0060, -0.0333], device='cuda:0', grad_fn=<SliceBackward0>) Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0294, 0.0120, -0.0287, ..., -0.0280, -0.0299, 0.0083], [ 0.0260, -0.0075, 0.0430, ..., -0.0196, -0.0200, 0.0145]], device='cuda:0', grad_fn=<SliceBackward0>) Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([-0.0003, -0.0043], device='cuda:0', grad_fn=<SliceBackward0>) Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0287, -0.0199, -0.0147, ..., 0.0074, 0.0403, 0.0068], [ 0.0375, -0.0005, 0.0372, ..., -0.0426, -0.0094, -0.0081]], device='cuda:0', grad_fn=<SliceBackward0>) Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([-0.0347, 0.0438], device='cuda:0', grad_fn=<SliceBackward0>)
5 Further Reading
Note: take study notes. If you make mistakes, please correct them! It's not easy to write an article. Please contact me for reprint.