Machine learning Chapter 5 logistic regression algorithm (learning notes of Dahua Python machine learning)

Chapter 5 logistic regression algorithm

based on the linear regression algorithm, the logical regression algorithm constructs the conversion function of the dependent variable y, and divides the number of Y into two or more categories of 0-1, so as to realize the classification fitting and prediction of things.

5.1 from linear regression to classification problem

Regression method is an algorithm for predicting and modeling continuous random variables.
forecast house prices, stock trends, commodity sales, etc.
Classification method is an algorithm for modeling or predicting discrete random variables
filter spam, financial fraud, predict whether the evaluation is positive or negative, etc.
The characteristic of regression task is that the labeled data set is continuous random variable
The classification algorithm is suitable for predicting a discrete category (probability of category)

5.2 classification based on Sigmoid function

logistic regression is classified based on Sigmoid function. The function form is:

chart

5.3 find the optimal solution by gradient descent method

similar to linear regression, the parameter solution of logical regression is also based on the optimization principle of cost function. However, because the output value of logical regression is discontinuous, log likelihood cost function or log likelihood loss function is usually used to establish the equation for parameter solution.

5.3.1 log likelihood function

Probability:
is the possibility of something happening in a specific environment. It describes the output result of random variables when the parameters are known.
Likelihood:
is the possible parameter to speculate the result after determining the result, which is used to describe the possible value of the position parameter when the output result of the known random variable.

The probability function is usually used as p(x)| θ) Representation (to be exact, conditional probability)
θ Represents the parameter corresponding to the occurrence of the event
x indicates the result
The likelihood function is usually used in L( θ| x) Show

Maximum likelihood / maximum likelihood:
since the likelihood describes the possibility of the event under different conditions when the results are known, the greater the value of the likelihood function, the greater the possibility of the event under the corresponding conditions.
in the field of machine learning, the reason why we should pay attention to maximum likelihood is that we need to find out the most likely condition to produce this result according to the known events (existing samples / training sets), so that we can infer the probability of unknown events (prediction samples / prediction sets) according to this condition.

5.3.2 parameter solution of gradient descent method

Cost function of logistic regression:

chart
The minus sign is added because the maximum likelihood estimation is used to calculate LNL( θ) The gradient descent method is generally used to find the minimum value, so the minus sign can be added to find a minimum parameter by using the gradient descent method θ.

5.4 python implementation of logistic regression

5.4.1 python example of gradient descent method: predicting whether students will be admitted (I)

Import data

df = pd.read_csv('D:/PythonProject/machine/data/5_logisitic_admit.csv')
# Insert a column with all 1 in df
df.insert(1,'Ones',1)
# Filter the data with admit of 1 to form a separate data set
positive = df[df['admit'] == 1]
# Filter the data with admit 0 to form a separate data set
negative = df[df['admit'] == 0]

# Create subgraph
fig,ax = plt.subplots(figsize=(8,5))
# Construct positive scatter diagram
ax.scatter(positive['gre'],positive['gpa'],s=30,c='b',marker='o',label='admit')
# Build negative scatter diagram
ax.scatter(negative['gre'],negative['gpa'],s=30,c='r',marker='x',label='not admit')
# Set legend
ax.legend()
# Set x,y axis labels
ax.set_xlabel('gre')
ax.set_ylabel('gpa')
plt.show()

Build sigmoid() function and predict() function

# Build sigmoid() function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Building the predict() function
def predict(theta,X):
    # Predict the probability of admit according to sigmoid() function
    prob = sigmoid(X * theta.T)
    # Set the threshold according to the probability of admit. If it is greater than 0.5, it will be recorded as 1, otherwise it will be 0
    return [1 if a >= 0.5 else 0 for a in prob]

Build gradient descent() function

"""
    X,y: Input variable
    theta: parameter
    alpha: Learning rate
    m: Number of samples
    numIter: Number of gradient descent iterations
"""

def gradientDescent(X, y, theta, alpha, m, numIter):
    # Matrix transpose
    XTrans = X.transpose()
    # Loop between 1-numIter
    for i in range(0, numIter):
        # Convert theta to matrix
        theta = np.matrix(theta)
        # Convert predicted values to arrays
        pred = np.array(predict(theta, X))
        # See actual value for predicted value
        loss = pred - y
        # Calculated gradient
        gradient = np.dot(XTrans, loss)
        # Calculation of theta parameter, update rule
        theta = theta - alpha * gradient
    return theta

Solving parameters by gradient descent method

# Take the last three columns of df as X variables
X = df.iloc[:,1:4]
# Set y variable
y = df['admit']
# Convert X and Y into array form for easy calculation
X = np.array(X.values)
y = np.array(y.values)
# Set the training sample value m and the number of variables n
m,n = np.shape(X)
# initialization
theta = np.ones(n)
# Check whether the row and column numbers of X and y are consistent
print(X.shape,theta.shape,y.shape)
# Number of iterations
numIter = 1000
# Learning rate
alpha = 0.00001
# The constructed gradientDescent() function is used to solve theta
theta = gradientDescent(X,y,theta,alpha,m,numIter)
print('θ={}'.format(theta))

θ=[[ 0.82635 -1.3196 0.7192773]]

Predict and calculate accuracy

# Use the predict() function to predict y
pred = predict(theta,X)
# If the forecast is 1, the actual value is 1, the forecast is 0, and the actual value is 0, they are recorded as 1
correct = [1 if((a == 1 and b == 1) or (a ==0 and b == 0))
           else 0 for (a,b) in zip(pred,y)]
# The total correct value is used to calculate the number of prediction pairs
accuracy = (sum(map(int,correct)) % len(correct))
# Print prediction accuracy
print('accurary={:.2f}%'.format(100*accuracy/m))

accurary=67.25%

Data file:
Link: https://pan.baidu.com/s/1TVPNcRKgrDttDOFV8Q2iDA
Extraction code: h4ex

Keywords: Python Machine Learning

Added by expertis on Thu, 13 Jan 2022 05:07:45 +0200

Programming VIP