Univariate linear regression example and gradient descent application and summary of recent learning knowledge points learning notes on January 16

catalogue

linear regression

Univariate linear regression

loss function

gradient descent

Find the linear regression function by gradient descent:

Several important concepts (about data processing)

Several common libraries:

linear regression

Linear regression is a statistical analysis method that uses regression analysis in mathematical statistics to determine the interdependent quantitative relationship between two or more variables. It is widely used.

Regression analysis includes only one independent variable and one dependent variable, and the relationship between them can be approximately expressed by a straight line. This regression analysis is called univariate linear regression analysis. If the regression analysis includes two or more independent variables, and there is a linear relationship between dependent variables and independent variables, it is called multiple linear regression analysis.

Univariate linear regression

From the above definition, we can know that regression analysis only includes an independent variable and a dependent variable, and the relationship between them can be approximately expressed by a straight line. This regression is called univariate linear regression analysis. Univariate linear regression is a kind of regression, and the evaluated independent variable X and dependent variable Y are linear. When there is only one independent variable, it is called univariate linear regression.

We need to know what is linear?, Generally speaking, the image we get is a straight line, and the highest order term of the independent variable is 1,

Fitting is to construct an algorithm so that the algorithm can conform to the real data.

From the perspective of machine fitting, linear regression is to construct a linear function to make the consistency between the function and the target value the best. From the perspective of space, it is necessary to make the straight line (face) of the function as close to all data points in space as possible (the sum of the distance parallel to the y axis from the point to the straight line is the shortest).

Linear regression model:

For example, if we have some data of house area and corresponding price, we can fit a straight line to make it conform to the relationship between house area and price as much as possible (that is, the sample points fall near the straight line we fit as much as possible)

So how should we fit this line? We first need to know the loss function.

loss function

Loss function is also called objective function or cost function. In short, it is a function of error. The loss function is used to measure the difference between the predicted value and the real value of the model. The goal of machine learning is to establish a loss function to minimize the value of the function.
In other words, the loss function is a function of model parameters, and the possible value combination of independent variables is usually infinite. Our goal is to find the most appropriate combination of independent variables among many possible combinations to minimize the value of the loss function.

The loss function is used to evaluate the difference between the predicted value and the real value of the model. The smaller the loss function, the better the performance of the model. The loss functions used in different models are generally different

Generally speaking, the real value of the data is used to subtract the predicted value obtained by the functional model, and the error of a sample is calculated. It is used to estimate the inconsistency between the predicted value f(x) of your model and the real value Y.

The formula is LOSS = real value - predicted value

For univariate linear regression. We usually use the mean square loss function:

If we have a loss function, how can we find its minimum value? Here 1 we will use the gradient descent method.

gradient descent

gradient descent is widely used in machine learning. Whether in linear regression or Logistic regression, its main purpose is to find the minimum value of the objective function through iteration or converge to the minimum value.

Deflection:

In a univariate function, the derivative is the rate of change of the function. For the "rate of change" of binary function, the situation is much more complicated because there is one more independent variable.

In the xOy plane, when the moving point changes in different directions from P(x0,y0), the change speed of function f(x,y) is generally different. Therefore, it is necessary to study the change rate of f(x,y) in different directions at point (x0,y0).

Here, we only study the change rate of f(x,y) when the function f(x,y) changes along two special directions parallel to the x-axis and parallel to the y-axis.

The symbol of partial derivative is: ∂. The partial derivative reflects the rate of change of the function along the positive direction of each coordinate axis. The maximum direction of the gradient is the direction of the directional derivative, that is, the direction with the fastest numerical change.

Gradient descent;

We can use downhill as an example: (here we quote the example given by others)

Suppose such a scenario: a person is trapped on a mountain and needs to come down from the mountain (find the lowest point of the mountain, that is, the valley). But at this time, the dense fog on the mountain is very large, resulting in low visibility; Therefore, the path down the mountain cannot be determined. We must use the information around us to find the way down the mountain step by step. At this time, you can use the gradient descent algorithm to help yourself down the mountain. How to do it? First, take his current position as the benchmark, find the steepest place in this position, then take a step in the downward direction, and then continue to take his current position as the benchmark, find the steepest place, and then walk until he finally reaches the lowest place;

In short: Generally speaking, it is to use the information of the current location to iterate step by step to find the location of the minimum value.

According to our knowledge of high numbers, the direction along the derivative is the maximum direction of numerical change. Therefore, derivation is essential. Just like going down the mountain, when we have the direction of going down the mountain, we should consider the steps we have taken down the mountain. The step size is also an important parameter in the gradient descent. When we have the direction of going down the mountain and the step size of going down the mountain, we can go step by step to the bottom of the mountain, that is, the parameters of the minimum value point in the function we want to require.
The following is a schematic diagram of one dimension and multi dimension:

General steps of gradient descent:

Let's look at two specific examples: (using gradient descent to find the lowest point of the function)

Manual derivation:

Code example:

Unitary situation:

import numpy as np
import matplotlib.pyplot as plt
plot_x=np.linspace(-1,6,141)   #The number of generated 141 equidistant between - 1 and 6
 # At the same time, according to plot_x to generate plot_y
plot_y=0.5*plot_x*plot_x-2*plot_x+3
plt.plot(plot_x,plot_y)
plt.show()
###Define a function dJ for finding the derivative of a quadratic function
def dJ(x):
    return x-2

###Define a function J for finding the value of the function
def J(x):
    try:
        return 0.5*x*x-2*x+3
    except:
        return float('inf')
x=0.0							#Pick a starting point at random
eta=0.1		#Learning rate
i=0
epsilon=1e-8				    #A condition used to determine whether the minimum point of a quadratic function is reached
history_x=[x]                   #Used to record the X coordinate of the point passed by the gradient descent method
while True:
    i=i+1
    d=0.5*x*x-2*x+3
    gradient=dJ(x)				#Gradient (derivative)
    last_x=x
    x=x-eta*gradient
    print("The first%d Second iteration function value%f x coordinate%f Rate of change%f"%(i,J(last_x),x,abs(J(last_x)-J(x))))
    history_x.append(x)
    if (abs(J(last_x)-J(x)) <epsilon):		#Used to judge whether it approaches the lowest point
        break
print(history_x)	        #Print the value of x when the lowest point is reached
plt.plot(plot_x,plot_y)
plt.plot(np.array(history_x),J(np.array(history_x)),color='r',marker='*')   #Draw the track of x
plt.show()

Image replacement:

Binary situation:

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
#Define the derivative of x
def dJx(x):
    return 2*x-20
#Define the derivative of y
def dJy(y):
    return 2*y-20
#Define evaluation function
def z(x,y):
    return (x-10)**2+(y-10)**2
x=100.0
eta=0.1
i=0
k=0
y=100.0
epsilon=1e-8
historz_x=[x]
historz_y=[y]
m=z(x,y)
z1=[]
z1.append(m)
while True:
    i=i+1
    #dz=(x-10)**2+(y-10)**2
    gradientx=dJx(x)		#x gradient (derivative)
    gradienty=dJy(y)
    last_x=x
    last_y=y
    x=x-eta*gradientx
    y=y-eta*gradienty
    print("The first%d Second iteration function value%f x coordinate%f y coordinate%f  Rate of change%f " %(i,z(last_x,last_y),x,y,abs(z(last_x,last_y)-z(x,y))))
    historz_x.append(x)
    historz_y.append(y)
    m=z(x,y)
    z1.append(m)
    if (abs(z(last_x,last_y)-z(x,y)) <epsilon):		#Used to judge whether it approaches the lowest point
        break
print(historz_x)#Print the value of x when the lowest point is reached
print(historz_y)#Print the value of y when the lowest point is reached
print(z1)

fig = plt.figure()
ax = fig.add_subplot(projection='3d')
#ax = Axes3D(plt.figure())

# Generate the coordinate set (- 2,2) interval of X and y with an interval of 0.1
x = np.arange(-100, 100, 0.1)
y = np.arange(-100, 100.,0.1)

# Generate grid
X, Y = np.meshgrid(x, y)

# Z-axis function
Z =(X-10)**2+(Y -10)**2
# Define x,y axis names
plt.xlabel("x")
plt.ylabel("y")
# Set spacing and color
ax.plot_surface(X, Y, Z)
ax.plot(historz_x,historz_y,z1,'ko', lw=2, ls='-')
# Exhibition
plt.show()

Image changes:

Find the linear regression function by gradient descent:

Main steps of parameter update:

1. Randomly initialize a set of parameters θ

2. Set the objective function J( θ) J( θ) For each parameter θ Find the partial derivative (which can also be understood as the fastest direction down the mountain at each current position)

3. Subtract the derivative of the old value from the old value and multiply it by the step to get the new value.

θnew=θold-a *f'(θold)

a is learning efficiency (also can be understood as the pace of going down the mountain)

b is the change of X in each iteration ； b= θ old- θ new.

4. The number of iterations is generally controlled by two parameters

The initial number of cycles {or when b, that is, the change is less than a specific value. Note: a can be set by itself. It is not easy to be too large or too small. Too large is easy to cause inaccuracy, and too small is easy to cause too many iterations.
Let's look at a specific example: (use gradient descent to find the minimum point parameter of loss function)

Let's continue to look at an example of linear regression (a job):

Code example

import math
m=7 #Dataset size
alpha=0.000001#Learning rate
area=[150,200,250,300,350,400,600];#data set
price=[6450,7450,8450,9450,11450,15450,18450];
def gradientx(Theta0,Theta1):#Partial derivative of Theta0
    ans=0
    for i in range(0,7):
        ans=ans+Theta0+Theta1*area[i]-price[i]
    ans=ans/m
    return ans
def gradienty(Theta0,Theta1):#Partial derivative of Theta1 0
    ans=0
    for i in range(0,7):
        ans=ans+(Theta0+Theta1*area[i]-price[i])*area[i]
    ans=ans/m
    return ans
def loss(Theta0,Theta1):  #loss function 
    ans=0
    for i in range(0,7):
        ans=ans+pow((Theta0+Theta1*area[i]-price[i]),2)
    ans=ans/(2*m)
    return ans
nowTheta0=1700  #Initial value set
nowTheta1=60
print('Set parameters at')
print(nowTheta0,nowTheta1)
#while math.fabs(nowTheta1-Theta1) >0.0000001 :#gradient descent 
for i in range(500000):
    nowa=nowTheta0
    nowTheta0 = nowTheta0-alpha*gradientx(nowTheta0,nowTheta1)
    nowTheta1 = nowTheta1-alpha*gradienty(nowa, nowTheta1)
    if loss(nowTheta0,nowTheta1)<100.0:
        break
print('Fitting parameters')
print(nowTheta0,nowTheta1 )
import numpy as np
from matplotlib import pyplot
area=[150,200,250,300,350,400,600]#data set
price=[6450,7450,8450,9450,11450,15450,18450]
pyplot.scatter(area,price)
x=np.arange(0,700,0.1) #Value of random production X
y=nowTheta1*x+nowTheta0
pyplot.plot(x,y)
pyplot.xlabel('area')
pyplot.ylabel('price')
pyplot.show()

Fitting results: