Gradient descent, over fitting and normalization

Good courses should be shared with more people: AI video list - shangxuetang After clicking any one of them, you will find that they will provide the download address of Baidu online disk integrated with a series of courses, including video + code + information, free high-quality resources. Of course, there are a lot of sharing now. All kinds of MOOCS, blogs, forums, etc. can easily find all kinds of knowledge. Where we can go is in ourselves. I hope I can keep on, come on!

Gradient descent method

Look at this, Jane's book: Deep to shallow -- gradient descent method and its implementation

Batch gradient decline

· initialize W, i.e. random W, to give initial value

· iteration in the direction of negative gradient, the updated w makes the loss function J(w) smaller

· if the W dimension is hundreds of dimensions, it is also possible to calculate SVD directly, and the gradient descent algorithm is generally used when the W dimension is more than hundreds of dimensions

# Batch gradient decline
import numpy as np

# Create your own data, ha ha
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
X_b = np.c_[np.ones((100, 1)), X]

learning_rate = 0.1     # Learning rate, step size=Learning rate x gradient
n_iterations = 1000     # Iteration times,Generally, no threshold value is set, only super parameters and iterations are set
m = 100     # m One sample

theta = np.random.randn(2, 1)   # Initialization parameters theta，w0，...,wn
count = 0   # count

for iteration in range(n_iterations):
    count += 1
    # Seeking gradient
    gradients = 1/m * X_b.T.dot(X_b.dot(theta)-y)
    # Iterative updating theta value
    theta = theta - learning_rate * gradients
    # print(count, theta)
    
print(count, theta)

Random gradient descent

· preferred random gradient descent

· sometimes random gradient descent can jump out of local minimum value

import numpy as np

X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
X_b = np.c_[np.ones((100, 1)), X]

n_epochs = 500 
t0, t1 = 5, 50
m = 100

def learning_schedule(t):
    return t0/(t + t1)

# Random initialization parameter value
theta = np.random.randn(2, 1)

for epoch in range(n_epochs):
    for i in range(m):
        random_index = np.random.randint(m)
        xi = X_b[random_index:random_index+1]
        yi = y[random_index:random_index+1]
        gradients = 2*xi.T.dot(xi.dot(theta)-yi)
        learning_rate = learning_schedule(epoch*m + i)
        theta = theta - learning_rate * gradients

print(theta)

Keywords: Python

Added by visitor-Q on Wed, 01 Jan 2020 04:02:00 +0200

Programming VIP

Gradient descent, over fitting and normalization

Popular Keywords