pytorch mlp network tutorial

"""
Define your neural network architecture
 Initialize optimizer and loss function
 Cycle your training periods
 Cycle data batches in each period
 Forecast the current batch data and calculate the loss
 Zero gradient
 Perform back propagation
 Tell the optimizer to update the gradient of the network
 tell PyTorch use GPU Train your network (of course, if there is one on your machine) GPU)
"""

# import the necessary packages
from collections import OrderedDict
import torch.nn as nn

def get_training_model(inFeatures=4, hiddenDim=8, nbClasses=3):
	# construct a shallow, sequential neural network
	mlpModel = nn.Sequential(OrderedDict([
		("hidden_layer_1", nn.Linear(inFeatures, hiddenDim)),
		("activation_1", nn.ReLU()),
		("output_layer", nn.Linear(hiddenDim, nbClasses))
	]))
    
	# return the sequential model
	return mlpModel

Lines 2 and 3 import the Python package we need:
Then we define get_training_model function (line 5), which accepts three parameters:
Number of input nodes of neural network
Number of nodes in network hidden layer
Number of output nodes (i.e. dimension of output forecast)
According to the default value provided, we can see that we are building a 4-8-3 neural network, which means that there are 4 nodes in the input layer and 8 nodes in the hidden layer, and the output of the neural network will be composed of 3 values
Then on lines 7-11, first initialize an NN Sequential object (very similar to the sequential class of Keras/TensorFlow).
Inside the network NN Sequential we build a class ordered dictionary in which each entry in the dictionary contains two values:
A string of readable names of hidden layers (useful when debugging neural network architectures using PyTorch)
PyTorch layer definition itself
Linear class is our fully connected layer definition, which means that each input is connected to each output in this layer. Linear class accepts two required parameters:
Input number of layers
Output quantity
On line 8, we define hidden_ layer_ It consists of a fully connected layer that accepts the input of inFeatures and then generates the output hiddenDim
From there, we apply a ReLU activation function (line 9), followed by another linear layer as our output (line 10).

Note that the second Linear definition contains the same number of inputs and outputs to the previous Linear layer - this is not accidental!
The output dimension of the upper layer must match the input dimension of the lower layer, otherwise PyTorch will make an error (then you will have a rather cumbersome task to debug the layer dimension yourself).

PyTorch is not tolerant in this regard (unlike Keras/TensorFlow), so be careful when specifying the layer size.

Then the generated PyTorch neural network is returned to the calling function.

"""
Create our PyTorch Training script 
After implementing our neural network architecture, we can continue to use it PyTorch Training model.

To complete this task, we need to implement a training script:

Create an example of our neural network architecture
 Build our dataset
 Determine if we are GPU Train our model on
 Define a training cycle (the hardest part of our script)

open train.py，Let's start:
"""

# import the necessary packages(train.py)
from pyimagesearch import mlp
from torch.optim import SGD
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_blobs
import torch.nn as nn
import torch

def next_batch(inputs, targets, batchSize):
	# loop over the dataset
	for i in range(0, inputs.shape[0], batchSize):
		# yield a tuple of the current batched data and labels
		yield (inputs[i:i + batchSize], targets[i:i + batchSize])

# specify our batch size, number of epochs, and learning rate
BATCH_SIZE = 64
EPOCHS = 10
LR = 1e-2

# determine the device we will be using for training
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print("[INFO] training using {}...".format(DEVICE))

# generate a 3-class classification problem with 1000 data points,
# where each data point is a 4D feature vector
print("[INFO] preparing data...")
(X, y) = make_blobs(n_samples=1000, n_features=4, centers=3,
	cluster_std=2.5, random_state=95)

# create training and testing splits, and convert them to PyTorch
# tensors
(trainX, testX, trainY, testY) = train_test_split(X, y,
	test_size=0.15, random_state=95)
trainX = torch.from_numpy(trainX).float()
testX = torch.from_numpy(testX).float()
trainY = torch.from_numpy(trainY).float()
testY = torch.from_numpy(testY).float()

# initialize our model and display its architecture
mlp = mlp.get_training_model().to(DEVICE)
print(mlp)

# initialize optimizer and loss function
opt = SGD(mlp.parameters(), lr=LR)
lossFunc = nn.CrossEntropyLoss()

# create a template to summarize current training progress
trainTemplate = "epoch: {} test loss: {:.3f} test accuracy: {:.3f}"

# loop through the epochs
for epoch in range(0, EPOCHS):
	# initialize tracker variables and set our model to trainable
	print("[INFO] epoch: {}...".format(epoch + 1))
	trainLoss = 0
	trainAcc = 0
	samples = 0
	mlp.train()
    
	# loop over the current batch of data
	for (batchX, batchY) in next_batch(trainX, trainY, BATCH_SIZE):
		# flash data to the current device, run it through our
		# model, and calculate loss
		(batchX, batchY) = (batchX.to(DEVICE), batchY.to(DEVICE))
		predictions = mlp(batchX)
		loss = lossFunc(predictions, batchY.long())
        
		# zero the gradients accumulated from the previous steps,
		# perform backpropagation, and update model parameters
		opt.zero_grad()
		loss.backward()
		opt.step()
        
		# update training loss, accuracy, and the number of samples
		# visited
		trainLoss += loss.item() * batchY.size(0)
		trainAcc += (predictions.max(1)[1] == batchY).sum().item()
		samples += batchY.size(0)
        
	# display model progress on the current training batch
	trainTemplate = "epoch: {} train loss: {:.3f} train accuracy: {:.3f}"
	print(trainTemplate.format(epoch + 1, (trainLoss / samples),
		(trainAcc / samples)))
    
	# initialize tracker variables for testing, then set our model to
	# evaluation mode
	testLoss = 0
	testAcc = 0
	samples = 0
	mlp.eval()
    
	# initialize a no-gradient context
	with torch.no_grad():
		# loop over the current batch of test data
		for (batchX, batchY) in next_batch(testX, testY, BATCH_SIZE):
			# flash the data to the current device
			(batchX, batchY) = (batchX.to(DEVICE), batchY.to(DEVICE))
            
			# run data through our model and calculate loss
			predictions = mlp(batchX)
			loss = lossFunc(predictions, batchY.long())
            
			# update test loss, accuracy, and the number of
			# samples visited
			testLoss += loss.item() * batchY.size(0)
			testAcc += (predictions.max(1)[1] == batchY).sum().item()
			samples += batchY.size(0)
            
		# display model progress on the current test batch
		testTemplate = "epoch: {} test loss: {:.3f} test accuracy: {:.3f}"
		print(testTemplate.format(epoch + 1, (testLoss / samples),
			(testAcc / samples)))
		print("")

Lines 2-7 import the Python package we need, including:

MLP: our definition of multi-layer perceptron architecture is implemented in PyTorch
SGD: we will use the stochastic gradient descent optimizer for training the model
make_blob: build a composite dataset of sample data
train_test_split: split our dataset into training and testing
nn: neural network function of PyTorch
torch: base PyTorch Library

When training the neural network, we will do it in batches (as you know before). next_batch to generate such batches for our training cycle:
The function accepts three arguments:

inputs: the data we input into the neural network
targets: our target output value (that is, the value we want the neural network to accurately predict)
batchSize: the size of the data batch

Then we loop through our input data batchSize (line 11) and give them to the calling function (line 13).
Next, we need to handle some important initialization:

When using PyTorch to train our neural network, we will train 10 epoch s with a batch size of 64 and use a learning rate of 1e-2 (lines 16-18).
We set up our training device (CPU or GPU) on line 21. The GPU will certainly speed up the training, but it is not necessary in this example.

Next, we need an example data set to train our neural network. In the next tutorial in this series, we will learn how to load images from disk and train neural networks on image data,
But now, let's use scikit learn's make_ The blobs function creates a composite dataset for us:

Lines 27 and 28 build our dataset, including:
Class III labels (centers= 3)
Four total features / inputs of neural network (n_features= 4)
1000 data points in total (n_samples = 1000)

In essence, make_ The blob function is generating a Gaussian blob that aggregates data points. For 2D data, make_ The blob function creates data similar to the following:

Note that there are three sets of data. We're doing the same thing, but we have four dimensions instead of two (which means we can't easily visualize it).

After generating the data, we apply train_ test_ The split function (lines 32 and 33) creates our training segmentation, 85% for training and 15% for evaluation.
From there, the training and test data are converted from the NumPy array to the PyTorch tensor, and then to the floating-point data type (lines 34-37).

Now let's instantiate our PyTorch neural network architecture:
Line 40 initializes our MLP and pushes it to any device we use for training (CPU or GPU).

Line 44 defines our SGD optimizer, which accepts two parameters:

MLP model parameters are obtained by simple call parameters()
Learning rate
Finally, we initialize our classification cross entropy loss function, which is the standard loss method you will use when classifying > 2 classes.

We now reach our most important code block, the training loop. Unlike Keras/TensorFlow, it allows you to simply call model fit
To train your model, PyTorch requires you to manually implement your training cycle.

There are pros and cons of having to manually implement the training cycle.

On the one hand, you have complete control over the training process, making it easier to implement custom training cycles.

On the other hand, manually implementing a training cycle requires more code and, worst of all, it is easier to get yourself into trouble (especially for novice deep learning practitioners).

My suggestion: you need to read the explanation of the following code block several times to understand the complexity of the training cycle. In particular, you'll want to pay close attention to how we reset the gradient, perform back propagation, and then update the model parameters - if we don't follow this exact order, it will lead to wrong results!

Let's review our training cycle:

Line 48 initializes trainTemplate, a string that allows us to easily display the time algebra, as well as the loss and accuracy of each step.

Then, we set the desired number of training epoch s, line 51, in which:

Displays the epoch number, which is useful for debugging (line 53)
Initialize our training loss and accuracy (lines 54 and 55)
Total number of data points used in the current iteration of initializing the training cycle (line 56)
Put PyTorch model in training mode (line 57)
Calling train() to update model parameters during back propagation requires a method of the PyTorch model.

In our next code block, you will see that we put the model into the eval() pattern so that we can evaluate the loss and accuracy on the test set. If we forget and call train()
At the top of the next training cycle, our model parameters will not be updated.

The outer layer iterates over our epoch number for the loop (line 51). Line 60 then starts an internal loop through each batch in the training set.
Almost every training process you write with PyTorch contains an outer loop (on the epoch number) and an inner loop (on the data batch).

In the internal loop (i.e. batch loop), we continue:

Move batchX and batchY data to our CPU or GPU (depending on our device)
Through the batchX data, through the neural network and predict it
Using our loss function, we calculate our loss prediction to our real class label by comparing the output
Now that we have our defeat, we can update our model parameters - this is the most important step in the PyTorch training process, and usually the first step that scholars screw up.

To update the parameters of our model, we must call lines 69-71 in the exact order specified:

opt.zero_grad(): zero the gradient accumulated in the previous batch / step of the model
loss.backward(): performs back propagation
opt.step(): update the weights in our neural network according to the results of back propagation
Again, you must zero the gradient, perform a reverse pass, and then update the model parameters in the exact order I pointed out.

As I mentioned, PyTorch gives you a lot of control over the training cycle, but it also makes it easy for you to lift a stone and hit yourself in the foot. Every deep learning practitioner,
Both novices and experienced experts who have just come into contact with the field of deep learning have screwed up these steps.
The most common mistake is forgetting to zero the gradient. If you do not zero the gradient, you will accumulate the gradient over multiple batches and periods. This will disrupt your back propagation and lead to incorrect weight updates.

Seriously, don't screw up these steps. Write them on post it notes and put them on your monitor if necessary.

After we update the weight of the model, we calculate our training loss, training accuracy and the number of samples checked (i.e. the number of data points in the batch) on lines 75-77.

Then we apply our trainTemplate to show our times, training loss and training accuracy. Note how we divide the loss and accuracy by the total number of samples in the batch to obtain the average.

At this point, we have trained our PyTorch model on all data points in an epoch - now we need to evaluate it on our test set:

Similar to how we initialize our training loss, training accuracy and the number of samples in the batch, we do the same for the test set on lines 86-88. Here, we initialize variables to store our test loss, test accuracy, and the number of samples in the test set.

We also put our model in line 89 of eval()
We need to put our model in evaluation mode when we need to calculate the loss / accuracy of the test or verification set.

But what about the eval() pattern? You can think of eval() model as a switch to turn off the functions of a specific layer,
For example, stop the application of dropout, or allow the application of batch normalized cumulative status.

Second, eval() is usually associated with torch no_ Grad () is used with the context, which means that gradient calculation is turned off in evaluation mode (line 92).

From there, we loop through all the batches in the test set (line 94), similar to looping through the training batches in the previous code block.
For each batch (line 96), we use our model to predict and then calculate the loss (lines 99 and 100).
Then update testLoss, testAcc, and the number of samples (lines 104-106).
Finally, we display our epoch number, test loss and test accuracy on the terminal (lines 109-112).
In general, the evaluation part of our training cycle is very similar to the training part, without subtle but very important changes:
We use eval() to put the model into evaluation mode
We use torch no_ Grad () context to ensure that the division calculation is not performed
From there, we can use our model to predict and calculate the accuracy / loss on the test set.

Our first few lines of output show a simple 4-8-3 MLP architecture, which means that the neural network has four inputs, a single hidden layer has eight nodes, and the last output layer has three nodes.
Then we trained our network for a total of 10 times. At the end of the training process, we obtained 99.1% accuracy on the training set and 98% accuracy on the test set.
Therefore, we can conclude that our neural network is good at making accurate prediction.
Congratulations on training your first neural network with PyTorch!
How to train PyTorch model on your own custom dataset?
This tutorial shows you how to make in scikit learn_ The PyTorch neural network is trained on the sample data set generated by the blobs function.
Although this is a good example of learning the basics of PyTorch, it is obviously not very interesting from the perspective of real scenes.

Added by jeaker on Tue, 25 Jan 2022 12:35:43 +0200

Programming VIP

pytorch mlp network tutorial

Popular Keywords