Graduation project Week1

The picture comes from the Internet. If there is infringement, contact to delete it.

This study is divided into two parts. The first part is the video learning part, and the second part is the code learning part.

Part1 video learning

Video 1 Introduction

one ️⃣ Turing test
Judge whether the box is a person or a machine outside the black box.
Ex: verification code system.
2️⃣ GAN
In World War II, the Allies cracked Germany's Enigma system by simulating the process of Enigma password generation. This idea is reflected in today's adversarial generation network (GAN).
Keywords: deep learning model, unsupervised learning, (at least) two modules (Generative Model and discriminant model))
In the original GAN theory, G and D are not required to be neural networks, but only functions that can fit the corresponding generation and discrimination. However, in practice, deep neural network is generally used as G and D.
Discriminant model: given a picture, judge whether the animal in the picture is a cat or a dog
Generation model: generate a new cat (not in the data set) from a series of cat pictures
three ️⃣ Three levels of artificial intelligence

four ️⃣ AI > Machine Learning > deep learning

five ️⃣ Logical deduction and induction

six ️⃣ machine learning
keywords: find an approximate solution.
In practical applications, features are often more important than classifiers.

Model, strategy and method.
Model (assumptions about the problem mapping to be learned)

According to the existing data set, we know the relationship between input and output results. According to this known relationship, an optimal model is trained.
What is supervised, unsupervised and intensive learning?

Parameters refer to the parameters of data distribution. The parametric model can make assumptions about the data distribution in advance.

The generation model models the joint distribution of inputs and outputs
Discriminant model models the conditional distribution of output under input conditions

Common methods of machine learning
Manual design features

Feature extraction: extract some effective features from the original data.
Feature transformation: process features to a certain extent, such as dimension increase and dimension reduction.

Feature extraction: PCA, LDA
Feature selection: mutual information, TF-IDF

Traditional machine learning and deep learning
A.Rule-based systems
Hand-designed program
B.Classic machine learning
Hand-designed feature
Mapping from features
C.Simple representation learning
Features
Mapping from features
D.Deep learning
Simple features
More complex features
Mapping from features

From top to bottom, the role of people is weakened and the ability of machine autonomous learning is enhanced.

Video 2 overview of deep learning

one ️⃣ "Can't" of deep learning
1. The algorithm output is unstable and easy to be "attacked"
Ex: change a pixel value against the sample.
2. The model has high complexity and is difficult to correct and debug
3. The model level is highly complex and the parameters are opaque
Ex: Inception (GoogLeNet), residual, door structure
Inception

Residual (ResNet)

4. The end-to-end training method has strong dependence on data and poor model increment
T e s t l o s s − T r a i n i n g l o s s ≤ N m Test loss - Training loss ≤ \sqrt{\frac{N}{m}} Testloss−Trainingloss≤mN
m: Training sample
N: Model effective capacity
5. Focus on intuitive perception problems and do nothing about open reasoning problems
6. Human beings cannot effectively introduce and supervise, and machine bias is difficult to avoid
two ️⃣ Interpretability and generalization
three ️⃣ M-P neuron
y = f( ∑ 1 n \sum_1^n ∑1nwixi - Θ \Theta Θ)
four ️⃣ Activation function
1. Linear function (Ex: identity function)
2. Slope function
3. Threshold function (Ex: step function)
4. Symbolic function
five ️⃣ Sigmoid function
y(x) = 1 1 + e < s u p > − x < / s u p > \frac{1}{1+{e<sup>{-x}</sup>}} 1+e<sup>−x</sup>1
y(x)' = y(x)(1 - y(x))
six ️⃣ Single layer perceptron

I understand that for MP neurons that can learn, the parameter w is not preset.

six ️⃣ Multilayer perceptron

With multiple neuronal layers
Perceptron with multiple hidden layers

seven ️⃣ Universal approximation theorem
If a hidden layer contains enough neurons, the three-layer feedforward neural network (input hidden layer output) can approach any predetermined continuous function with any accuracy.

y = a(W * x + b)
Complete input - > output space transformation
W * x: dimension increase / dimension decrease; Zoom in / out; rotate
+b: Translation
a(·): bending

Deeper and wider do not necessarily enhance the effect.
eight ️⃣ Gradient disappearance

After continuous derivation, the value of each derivative is very small, resulting in the final result close to 0.

nine ️⃣ Error back propagation
F(x) = fn(fn-1...f2(f(x) * Θ \Theta Θ1 + b) * Θ \Theta Θ2 +b)...)

Compare the difference between the actual result and the final result. The difference is to reduce the weight; Otherwise, increase the weight.
It can be adjusted through Learning Rate
Therefore, BP algorithm is easy to over fit (it performs well in the training set, but not in the new data set). It can be improved by stopping in advance, adjusting parameters with the training set and adjusting errors with the verification set.

Part2 code learning

Basic operations in Python

import torch
x = torch.tensor(666)
print(x)

# one-dimensional
x = torch.tensor([1, 2, 3, 4, 5, 6])
# two-dimensional
# Matrix with two rows and three columns
x = torch.ones(2, 3)
# Arbitrary dimension
x = torch.ones(2,3, 4)
# Create an empty tensor, x rows, y columns

x = torch.empty(x, y)

# Create a randomly initialized tensor

x = torch.rand(5, 3)

# Create a tensor with all zeros, and the data type in it is long
x = torch.zeros(5,3,dtype=torch.long)

# Initialize a new tensor with a known tensor

y = x.new_ones(5,3)

# Redefine the type of the original tensor

z = torch.randn_like(x, dtype = torch.float)

# output

print(m\[0][2])

print(m[:, 1])

# matrix multiplication 

m @ v

# Matrix transpose (3 methods)

print(m.t())

print(m.transpose(0, 1))

print(m.permute(1, 0))

Spiral classifciation

import random
import torch
from torch import nn, optim
import math
from IPython import display
from plot_lib import plot_data, plot_model, set_default

# Because colab supports GPU, torch will run on GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('device: ', device)

# Initialize random number seed. The parameters of the neural network are randomly initialized,
# Different initialization parameters often lead to different results. When better results are obtained, we usually hope that the results can be reproduced,
# Therefore, in pytorch, this goal can also be achieved by setting random number seeds
seed = 12345
random.seed(seed)
torch.manual_seed(seed)

N = 1000  # Number of samples per type
D = 2  # Characteristic dimension of each sample
C = 3  # Category of samples
H = 100  # Number of hidden layer units in neural network

torch.manual_seed(seed) is used to facilitate reproduction
After setting the random seed, the output result of the file is the same every time you run it
copy all the tensor variables at the beginning of reading data to the GPU specified by device, and the subsequent operations are carried out on the GPU.

Multi GPU

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = Model()

if torch.cuda.device_count() > 1:

  model = nn.DataParallel(model，device_ids=[0,1,2])

model.to(device)

To: Code in case of multiple GPU s

X = torch.zeros(N * C, D).to(device)
Y = torch.zeros(N * C, dtype=torch.long).to(device)
for c in range(C):
    index = 0
    t = torch.linspace(0, 1, N) # Take 10000 numbers evenly between [0, 1] and assign them to t
    # The following code need not be understood too much. In short, three types of samples (which can form a spiral) are calculated according to the formula
    # torch.randn(N) is a group of random numbers with N mean values of 0 and variance of 1. Pay attention to distinguish it from rand
    inner_var = torch.linspace( (2*math.pi/C)*c, (2*math.pi/C)*(2+c), N) + torch.randn(N) * 0.2
    
    # The (x,y) coordinates of each sample are saved in X
    # The categories of samples stored in Y are [0, 1, 2]
    for ix in range(N * c, N * (c + 1)):
        X[ix] = t[index] * torch.FloatTensor((math.sin(inner_var[index]), math.cos(inner_var[index])))
        Y[ix] = c
        index += 1

print("Shapes:")
print("X:", X.size())
print("Y:", Y.size())

torch.linspace(0, 1, N) divides the interval (0, 1) into N shares

Keywords: Machine Learning neural networks Deep Learning

Added by gabeg on Wed, 05 Jan 2022 06:09:59 +0200

Programming VIP