Cyclic Neural Network Learning

Cyclic Neural Network Learning

Circular Neural Network RNN


At time t, once the current input data x_is obtained T, the recurrent neural network combines the implicit coding _obtained from the previous moment t_1 T_1, produces the implicit encoding of the current moment as follows__ T:
ℎ_t=Φ(U×x_t+W×ℎ_t−1)

Here Φ () is an activation function, which can be either a Sigmoid or a Tanh activation function, enabling the model to forget extraneous information and update memory content at the same time. U and W are model parameters. As you can see here, the implicitly coded output of the current moment_ t is not only related to the current input data x_t-related, with existing "memory" of the network__ t_1 also has an inseparable link

Taking part-of-speech labeling as an example, a schematic diagram is given in which input sequence data (x_1,..., x_t_1,x_t,x_t+1,..., x_T) is processed by a cyclic neural network.


Parameter W_x will x_t maps to implicit encoding__ t, parameter W_o will _ t mapped to predictive output O_t, __ t_1 passes through the parameter W_ _Participate__ Generation of t. W_in Diagram X, W_o and W_ Is a multiplexing parameter.

It can also be understood as follows:

Where S(t-1) from the previous moment participates in the calculation of S(t) from the next moment.

Long-term and short-term memory network LSTM

This section can take a look at the Big Guys video on Station B, and I'll make a summary: lstm principle

The main purpose of lstm is to understand the calculation of several doors:

Look at the overall formula

The specific operations are as follows:The first four equations correspond to the results of the following four activation function operations, which multiply the hidden information h(t-1) of the previous step with the information x(t) of the current input, perform the feature fusion operation, then the forgotten weight f(t) is multiplied by the cell information unit C(t-1) of the previous step, and then added by the newly entered feature information i(t) by g(t) (where g(t) The new cell information C(t) is obtained as a feature selection operation, and then the sixth step is performed to get the output of the current hidden state. Then the hidden state result is mapped linearly by a layer, and the classification problem can be implemented by a softmax operation.

On dimension issues:
For example, enter a paragraph: (x_1,..., x_t_1,x_t,x_t+1,..., x_T), x_t-1 can be seen as a word, which in NLP tasks is a vector of words that is usually mapped to (1,512), which is represented here as 1d. The dimensions i(t), f(t), o(t) in lstm are the same as h(t), which can be recorded as 1h, which is set by itself when the lstm module is called. So, in connection with the above formulas, when we train, we are training weight matrix, W(i,i), W(i,f) and so on, which are all dh-dimensional, while W(h,i), W(h,f) and hidden information state operations are h H.

RNN LSTM Handwritten Number Recognition

# 1. Loading datasets
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



# 6. Define a function: Show a batch of data
def imshow(inp, title=None):
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406]) # mean value
    std = np.array([0.229, 0.224, 0.225]) # standard deviation
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1) # Limit speed to 0-1
    plt.imshow(inp)
    plt.show()
    if title is not None:
        plt.title(title)
    #plt.pause(1)




# 7. Define RNN model
class RNN_Model(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNN_Model, self).__init__()
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
        # Full Connection Layer
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        # (layer_dim, batch_size, hidden_dim)
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(device)
        # Separate hidden states to avoid gradient explosion
        out, hn = self.rnn(x, h0.detach())
        out = self.fc(out[:, -1, :])
        return out


# 1. Define the model
class LSTM_Model(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTM_Model, self).__init__()  # Initialize construction methods in parent class
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        # Building LSTM Model
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        # Full Connection Layer
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        # Initialization Hidden Layer State is all 0
        # (layer_dim, batch_size, hidden_dim)
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(device)
        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(device)
        # Separate hidden states to avoid gradient explosion
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
        # Only the state of the last hidden layer is required
        out = self.fc(out[:, -1, :])
        return out

def train(train_loader,test_loader,net,Loss_Function,learning_rate,plot_loss,plot_accuary,model_name):
    model=net

    #nn.CrossEntropyLoss()
    # 10. Define Optimizer
    learning_rate = learning_rate

    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
    # 13. Model Training
    sequence_dim = 28  # Sequence Length
    loss_list = []  # Save loss
    accuracy_list = []  # Save accuracy
    iteration_list = []  # Number of Save Loops

    iter = 0
    for epoch in range(EPOCHS):
        for i, (images, labels) in enumerate(train_loader):
            model.train()  # Declaration Training
            # Converting batch data into RNN input dimensions
            images = images.view(-1, sequence_dim, input_dim).requires_grad_().to(device)
            labels = labels.to(device)
            # Gradient Zeroing (otherwise it will continue to accumulate)
            optimizer.zero_grad()
            # Forward propagation
            outputs = model(images)
            # Calculate loss
            loss = Loss_Function(outputs, labels)
            # Reverse Propagation
            loss.backward()
            # Update parameters
            optimizer.step()
            # Counter auto plus 1
            iter += 1
            # Model validation
            if iter % 500 == 0:
                model.eval()  # statement
                # Calculate the accuracy of validation
                correct = 0.0
                total = 0.0
                # Iterate test sets, get data, predict
                for images, labels in test_loader:
                    images = images.view(-1, sequence_dim, input_dim).to(device)
                    # model prediction
                    outputs = model(images)
                    # Obtain subscript for maximum prediction probability
                    predict = torch.max(outputs.data, 1)[1]
                    # Statistical Test Set Size
                    total += labels.size(0)
                    # Statistical Judgment/Forecast Correct Quantity
                    if torch.cuda.is_available():
                        correct += (predict.cuda() == labels.cuda()).sum()
                    else:
                        correct += (predict == labels).sum()
                # Calculation
                accuracy = correct / total * 100
                # Save accuracy, loss, iteration
                loss_list.append(loss.data)
                accuracy_list.append(accuracy)
                iteration_list.append(iter)
                # Print Information
                print("loop : {}, Loss : {}, Accuracy : {}".format(iter, loss.item(), accuracy))
    if plot_loss==True:
        # Visualization loss
        plt.plot(iteration_list, loss_list)
        plt.xlabel('Number of Iteration')
        plt.ylabel('Loss')
        plt.title(model_name)
        plt.show()
    if plot_accuary==True:
        # Visual accuracy
        plt.plot(iteration_list, accuracy_list, color='r')
        plt.xlabel('Number of Iteration')
        plt.ylabel('Accuracy')
        plt.title(model_name)
        plt.savefig('{}_mnist.png'.format(model_name))
        plt.show()


if __name__ == "__main__":
    # 2. Download mnist dataset
    trainsets = datasets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor()) # format conversion

    testsets = datasets.MNIST(root='./data', train=False, transform=transforms.ToTensor())


    print(trainsets.data.shape)
    # 4. Define Hyperparameters
    BATCH_SIZE = 32 # Size of data read per batch
    EPOCHS = 10 # Training 10 rounds

    # 5. Create iterative objects for datasets, that is, read data from one batch to one batch
    train_loader = torch.utils.data.DataLoader(dataset=trainsets, batch_size=BATCH_SIZE, shuffle=True)

    test_loader = torch.utils.data.DataLoader(dataset=testsets, batch_size=BATCH_SIZE, shuffle=True)

    # View batch data


    # 8. Initialize the model
    input_dim = 28  # Input Dimension
    hidden_dim = 100  # Hidden Dimension, Neuron
    layer_dim = 2  # 2-Layer RNN
    output_dim = 10  # Output Dimension

    model_rnn = RNN_Model(input_dim, hidden_dim, layer_dim, output_dim)
    model_lstm=LSTM_Model(input_dim, hidden_dim, layer_dim, output_dim)

    # Determine if there is a GPU
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
    model_lstm.to(device=device)
    #  Define loss function
    criterion = nn.CrossEntropyLoss()

    train(train_loader,test_loader,model_lstm,criterion,learning_rate=0.01,plot_loss=True,plot_accuary=True,model_name="lstm")









Keywords: Python Back-end

Added by MHz on Tue, 30 Nov 2021 23:06:25 +0200