[Wu Enda Machine Learning Assignment 2] - Logical Regression Chapter Python Implementation

[Wu Enda Machine Learning Assignment 2] - Logical Regression Chapter Python Implementation

Recommended running environment: python 3.6

Establish a logistic regression model to predict whether a student will be admitted to a university or not. The admission opportunities of each applicant are determined according to the results of the two examinations. There are historical data of previous applicants.
It can be used as a training set of logistic regression
Training Data Network Disk Link: Link: https://pan.baidu.com/s/1w5r8kMrG4H_J2H9zMDyjVg Extraction Code: 67ec

python realizes logical regression
Objectives: To establish a classifier (to solve three parameters theta 0 theta 1 theta 2) that is, to draw a demarcation line.
Note: Theta 1 corresponds to'Exam 1'and theta 2 corresponds to'Exam 2'.
Setting threshold and judging admission result according to threshold
Note: Threshold refers to the final probability value. Converting the probability value into a category. Generally speaking, more than 0.5 is accepted and less than 0.5 is not accepted.
Realization content:

sigmoid: a function mapped to probability
model: Returns the predicted result value
cost: Calculate losses based on parameters
Gradient: Calculate the gradient direction of each parameter
descent: Update parameters
accuracy: Computational accuracy

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight') #Style Beautification
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report#This package is an evaluation report.

Preparing data

data = pd.read_csv('ex2data1.txt', names=['exam1', 'exam2', 'admitted'])
data.head()#Look at the first five elements
exam1 exam2 admitted
0 34.623660 78.024693 0
1 30.286711 43.894998 0
2 35.847409 72.902198 0
3 60.182599 86.308552 1
4 79.032736 75.344376 1
data.describe()
exam1 exam2 admitted
count 100.000000 100.000000 100.000000
mean 65.644274 66.221998 0.600000
std 19.458222 18.582783 0.492366
min 30.058822 30.603263 0.000000
25% 50.919511 48.179205 0.000000
50% 67.032988 67.682381 1.000000
75% 80.212529 79.360605 1.000000
max 99.827858 98.869436 1.000000

##Error case1 TypeError:unhashable type :_ColorPalette
Running Problem Replacement Scheme sns.set(context= "notebook", "style="darkgrid","palette=sns.color_palette("RdBu", 2),color_codes=False) because the color of the drawing board is customized

sns.set(context="notebook", style="darkgrid", palette=sns.color_palette("RdBu", 2)) #Set style parameters, default theme darkgrid (grey background + white grid), palette 2 colors

sns.lmplot('exam1', 'exam2', hue='admitted', data=data,   
           size=6, 
           fit_reg=False,                         #fit_reg'parameter, which controls whether the fitting line is displayed or not
           scatter_kws={"s": 50}
          )                                       #The hue parameter is to superimpose the different types of data specified by name on a graph to display
plt.show()#Look at the data.

def get_X(df):#Read feature
#     """
#     use concat to add intersect feature to avoid side effect
#     not efficient for big dataset though
#     """
    ones = pd.DataFrame({'ones': np.ones(len(df))})#One is the data frame of column 1 of m row
    data = pd.concat([ones, df], axis=1)  # Merge data. When axis = 1 is merged by column, concat is row alignment, and then merge two tables with different column names.
    return data.iloc[:, :-1].as_matrix()  # This operation returns ndarray, not a matrix


def get_y(df):#Read labels
#     '''assume the last column is the target'''
    return np.array(df.iloc[:, -1])#df.iloc[:, -1] is the last column of df


def normalize_feature(df):
#     """Applies function along input axis(default 0) of DataFrame."""
    return df.apply(lambda column: (column - column.mean()) / column.std())#The same applies to feature scaling in logistic regression
X = get_X(data)
print(X.shape)

y = get_y(data)
print(y.shape)
(100, 3)
(100,)

sigmoid function

G stands for a commonly used logic function, which is a S-shaped function. The formula is [g left ( z right)= frac {1} {1+{{e} ^ {z}} ]
Together, we obtain the hypothetical functions of the logistic regression model:
\[{{h}_{\theta }}\left( x \right)=\frac{1}{1+{{e}^{-{{\theta }^{T}}X}}}\]

def sigmoid(z):
    # your code here  (appro ~ 1 lines)
    return 1 / (1 + np.exp(-z))

The following program will call the function you wrote above and draw the sigmoid function image. If your program is correct, you should be able to see the function image below.

fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(np.arange(-10, 10, step=0.01),
        sigmoid(np.arange(-10, 10, step=0.01)))
ax.set_ylim((-0.1,1.1))     #lim axis display length
ax.set_xlabel('z', fontsize=18)
ax.set_ylabel('g(z)', fontsize=18)
ax.set_title('sigmoid function', fontsize=18)
plt.show()

Cost function

  • max(ℓ(θ))=min(−ℓ(θ))max(\ell(\theta)) = min(-\ell(\theta))max((θ))=min((θ))
  • choose −ℓ(θ)-\ell(\theta)(θ) as the cost function

KaTeX parse error: No such environment: align at position 7: \begin{̲a̲l̲i̲g̲n̲}̲ & J\left( \t...

theta = theta=np.zeros(3) # X(m*n) so theta is n*1
theta
array([ 0.,  0.,  0.])
def cost(theta, X, y):
    ''' cost fn is -l(theta) for you to minimize'''
    # your code here  (appro ~ 2 lines)
    return np.mean(-y * np.log(sigmoid(X @ theta)) - (1 - y) * np.log(1 - sigmoid(X @ theta)))
# Hint:X @ theta is equivalent to X.dot(theta)
cost(theta, X, y)
0.69314718055994529

## Error case 269.31 solution y = data. iloc [:, -1]. values y = y. reshape (len (y), 1) example [1,1]

If you write the correct code, the output here should be 0.6931471805599453

Gradient descent

  • This is batch gradient descent.
  • Conversion to vectorization: 1m XT(Sigmoid(X theta)y)frac {1} {m} X ^ T (Sigmoid (X theta) - y) M1 XT (Sigmoid (Xtheta)y)
    ∂J(θ)∂θj=1m∑i=1m(hθ(x(i))−y(i))xj(i)\frac{\partial J\left( \theta \right)}{\partial {{\theta }_{j}}}=\frac{1}{m}\sum\limits_{i=1}^{m}{({{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}})x_{_{j}}^{(i)}}∂θj​∂J(θ)​=m1​i=1∑m​(hθ​(x(i))−y(i))xj​(i)​
def gradient(theta, X, y):
    # your code here  (appro ~ 2 lines)
    return (1 / len(X)) * X.T @ (sigmoid(X @ theta) - y)
gradient(theta, X, y)
array([ -0.1       , -12.00921659, -11.26284221])

Fitting parameters

import scipy.optimize as opt
res = opt.minimize(fun=cost, x0=theta, args=(X, y), method='Newton-CG', jac=gradient)

## Error case 3 warning: Desired error not necessarily achieved due to precision loss error is mostly cost function problem

## Error case 4 The true value of an array with more than one element is ambiguous. Use A.any () or A.all () Numpy does not distinguish logical expressions clearly. Any means that as long as one True is returned to True,all means that all elements are True will return to True, otherwise False will be returned.

print(res)
     fun: 0.20349770451259855
     jac: array([  1.62947970e-05,   1.11339134e-03,   1.07609314e-03])
 message: 'Optimization terminated successfully.'
    nfev: 71
    nhev: 0
     nit: 28
    njev: 242
  status: 0
 success: True
       x: array([-25.16576744,   0.20626712,   0.20150754])

Prediction and Verification with Training Sets

def predict(x, theta):
    # your code here  (appro ~ 2 lines)
    prob = sigmoid(x @ theta)
    return (prob >= 0.5).astype(int)   #Implementing Variable Type Conversion
final_theta = res.x
y_pred = predict(X, final_theta)

print(classification_report(y, y_pred))
              precision    recall  f1-score   support

           0       0.87      0.85      0.86        40
           1       0.90      0.92      0.91        60

   micro avg       0.89      0.89      0.89       100
   macro avg       0.89      0.88      0.88       100
weighted avg       0.89      0.89      0.89       100

Finding Decision Boundaries

http://stats.stackexchange.com/questions/93569/why-is-logistic-regression-a-linear-classifier

X×θ=0X \times \theta = 0X×θ=0 (this is the line)

print(res.x) # this is final theta
[-25.16576744   0.20626712   0.20150754]
coef = -(res.x / res.x[2])  # find the equation
print(coef)

x = np.arange(130, step=0.1)
y = coef[0] + coef[1]*x
[ 124.88747248   -1.02361985   -1.        ]
data.describe()  # find the range of x and y
exam1 exam2 admitted
count 100.000000 100.000000 100.000000
mean 65.644274 66.221998 0.600000
std 19.458222 18.582783 0.492366
min 30.058822 30.603263 0.000000
25% 50.919511 48.179205 0.000000
50% 67.032988 67.682381 1.000000
75% 80.212529 79.360605 1.000000
max 99.827858 98.869436 1.000000
sns.set(context="notebook", style="ticks", font_scale=1.5)  Default use notebook Context Theme context You can set the size of the output picture.(scale)

sns.lmplot('exam1', 'exam2', hue='admitted', data=data, 
           size=6, 
           fit_reg=False, 
           scatter_kws={"s": 25}
          )

plt.plot(x, y, 'grey')
plt.xlim(0, 130) 
plt.ylim(0, 130)
plt.title('Decision Boundary')
plt.show()

3-Regularized Logical Regression

df = pd.read_csv('ex2data2.txt', names=['test1', 'test2', 'accepted'])
df.head()
test1 test2 accepted
0 0.051267 0.69956 1
1 -0.092742 0.68494 1
2 -0.213710 0.69225 1
3 -0.375000 0.50219 1
4 -0.513250 0.46564 1
sns.set(context="notebook", style="ticks", font_scale=1.5)

sns.lmplot('test1', 'test2', hue='accepted', data=df, 
           size=6, 
           fit_reg=False, 
           scatter_kws={"s": 50}
          )

plt.title('Regularized Logistic Regression')
plt.show()

feature mapping

polynomial expansion

for i in 0..i
  for p in 0..i:
    output x^(i-p) * y^p
```![image.png](attachment:image.png)


```python
def feature_mapping(x, y, power, as_ndarray=False):
#     """return mapped features as ndarray or dataframe"""

    data = {"f{}{}".format(i - p, p): np.power(x, i - p) * np.power(y, p)
                for i in np.arange(power + 1)
                for p in np.arange(i + 1)
            }

    if as_ndarray:
        return pd.DataFrame(data).as_matrix()
    else:
        return pd.DataFrame(data)

x1 = np.array(df.test1)
x2 = np.array(df.test2)
data = feature_mapping(x1, x2, power=6)
print(data.shape)
data.head()
(118, 28)
f00 f01 f02 f03 f04 f05 f06 f10 f11 f12 ... f30 f31 f32 f33 f40 f41 f42 f50 f51 f60
0 1.0 0.69956 0.489384 0.342354 0.239497 0.167542 0.117206 0.051267 0.035864 0.025089 ... 0.000135 0.000094 0.000066 0.000046 0.000007 0.000005 0.000003 3.541519e-07 2.477505e-07 1.815630e-08
1 1.0 0.68494 0.469143 0.321335 0.220095 0.150752 0.103256 -0.092742 -0.063523 -0.043509 ... -0.000798 -0.000546 -0.000374 -0.000256 0.000074 0.000051 0.000035 -6.860919e-06 -4.699318e-06 6.362953e-07
2 1.0 0.69225 0.479210 0.331733 0.229642 0.158970 0.110047 -0.213710 -0.147941 -0.102412 ... -0.009761 -0.006757 -0.004677 -0.003238 0.002086 0.001444 0.001000 -4.457837e-04 -3.085938e-04 9.526844e-05
3 1.0 0.50219 0.252195 0.126650 0.063602 0.031940 0.016040 -0.375000 -0.188321 -0.094573 ... -0.052734 -0.026483 -0.013299 -0.006679 0.019775 0.009931 0.004987 -7.415771e-03 -3.724126e-03 2.780914e-03
4 1.0 0.46564 0.216821 0.100960 0.047011 0.021890 0.010193 -0.513250 -0.238990 -0.111283 ... -0.135203 -0.062956 -0.029315 -0.013650 0.069393 0.032312 0.015046 -3.561597e-02 -1.658422e-02 1.827990e-02

5 rows × 28 columns

data.describe()
f00 f01 f02 f03 f04 f05 f06 f10 f11 f12 ... f30 f31 f32 f33 f40 f41 f42 f50 f51 f60
count 118.0 118.000000 118.000000 118.000000 1.180000e+02 118.000000 1.180000e+02 118.000000 118.000000 118.000000 ... 1.180000e+02 118.000000 1.180000e+02 118.000000 1.180000e+02 118.000000 1.180000e+02 1.180000e+02 118.000000 1.180000e+02
mean 1.0 0.183102 0.301370 0.142350 1.710985e-01 0.115710 1.257256e-01 0.054779 -0.025472 0.015483 ... 5.983333e-02 -0.005251 9.432094e-03 -0.001705 1.225384e-01 0.011812 1.893340e-02 5.196507e-02 -0.000703 7.837118e-02
std 0.0 0.519743 0.284536 0.326134 2.815658e-01 0.299092 2.964416e-01 0.496654 0.224075 0.150143 ... 2.746459e-01 0.096738 5.455787e-02 0.037443 2.092709e-01 0.072274 3.430092e-02 2.148098e-01 0.058271 1.938621e-01
min 1.0 -0.769740 0.000026 -0.456071 6.855856e-10 -0.270222 1.795116e-14 -0.830070 -0.484096 -0.483743 ... -5.719317e-01 -0.296854 -1.592528e-01 -0.113448 1.612020e-09 -0.246068 2.577297e-10 -3.940702e-01 -0.203971 6.472253e-14
25% 1.0 -0.254385 0.061086 -0.016492 3.741593e-03 -0.001072 2.298277e-04 -0.372120 -0.178209 -0.042980 ... -5.155632e-02 -0.029360 -3.659760e-03 -0.005749 1.869975e-03 -0.001926 1.258285e-04 -7.147973e-03 -0.006381 8.086369e-05
50% 1.0 0.213455 0.252195 0.009734 6.360222e-02 0.000444 1.604015e-02 -0.006336 -0.016521 -0.000039 ... -2.544062e-07 -0.000512 -1.473547e-07 -0.000005 2.736163e-02 0.000205 3.387050e-03 -1.021440e-11 -0.000004 4.527344e-03
75% 1.0 0.646562 0.464189 0.270310 2.155453e-01 0.113020 1.001215e-01 0.478970 0.100795 0.079510 ... 1.099616e-01 0.015050 1.370560e-02 0.001024 1.520801e-01 0.019183 2.090875e-02 2.526861e-02 0.002104 5.932959e-02
max 1.0 1.108900 1.229659 1.363569 1.512062e+00 1.676725 1.859321e+00 1.070900 0.568307 0.505577 ... 1.228137e+00 0.369805 2.451845e-01 0.183548 1.315212e+00 0.304409 2.018260e-01 1.408460e+00 0.250577 1.508320e+00

8 rows × 28 columns

Regulated cost

J(θ)=1m∑i=1m[−y(i)log⁡(hθ(x(i)))−(1−y(i))log⁡(1−hθ(x(i)))]+λ2m∑j=1nθj2J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{{y}^{(i)}}\log \left( {{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1-{{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}}J(θ)=m1​i=1∑m​[−y(i)log(hθ​(x(i)))−(1−y(i))log(1−hθ​(x(i)))]+2mλ​j=1∑n​θj2​

theta = np.zeros(data.shape[1])
X = feature_mapping(x1, x2, power=6, as_ndarray=True)
print(X.shape)

y = get_y(df)
print(y.shape)
(118, 28)
(118,)
def regularized_cost(theta, X, y, l=1):
    # your code here  (appro ~ 3 lines
    theta_j1_to_n = theta[1:]
    regularized_term = (l / (2 * len(X))) * np.power(theta_j1_to_n, 2).sum()
    
    return  cost(theta, X, y) + regularized_term
regularized_cost(theta, X, y, l=1)
0.6931471805599454

Because we set theta to zero, the regularized cost function should have the same value as the cost function.

Regulated gradient

∂J(θ)∂θj=(1m∑i=1m(hθ(x(i))−y(i)))+λmθj  for j≥1\frac{\partial J\left( \theta \right)}{\partial {{\theta }_{j}}}=\left( \frac{1}{m}\sum\limits_{i=1}^{m}{\left( {{h}_{\theta }}\left( {{x}^{\left( i \right)}} \right)-{{y}^{\left( i \right)}} \right)} \right)+\frac{\lambda }{m}{{\theta }_{j}}\text{ }\text{ for j}\ge \text{1}∂θj​∂J(θ)​=(m1​i=1∑m​(hθ​(x(i))−y(i)))+mλ​θj​  for j≥1

def regularized_gradient(theta, X, y, l=1):
    # your code here  (appro ~ 3 lines)
    theta_j1_to_n = theta[1:]      #No theta 0
    regularized_theta = (l / len(X)) * theta_j1_to_n
    
    regularized_term = np.concatenate([np.array([0]), regularized_theta])
    return gradient(theta, X, y) + regularized_term
regularized_gradient(theta, X, y)
array([  8.47457627e-03,   7.77711864e-05,   3.76648474e-02,
         2.34764889e-02,   3.93028171e-02,   3.10079849e-02,
         3.87936363e-02,   1.87880932e-02,   1.15013308e-02,
         8.19244468e-03,   3.09593720e-03,   4.47629067e-03,
         1.37646175e-03,   5.03446395e-02,   7.32393391e-03,
         1.28600503e-02,   5.83822078e-03,   7.26504316e-03,
         1.83559872e-02,   2.23923907e-03,   3.38643902e-03,
         4.08503006e-04,   3.93486234e-02,   4.32983232e-03,
         6.31570797e-03,   1.99707467e-02,   1.09740238e-03,
         3.10312442e-02])

Fitting parameters

import scipy.optimize as opt
print('init cost = {}'.format(regularized_cost(theta, X, y)))

res = opt.minimize(fun=regularized_cost, x0=theta, args=(X, y), method='Newton-CG', jac=regularized_gradient)
res
init cost = 0.6931471805599454





     fun: 0.5290027297127846
     jac: array([ -1.21895855e-07,   6.76917339e-08,  -2.48482252e-08,
         1.91138610e-08,  -2.53407958e-08,  -1.49952031e-08,
        -1.96092249e-08,   6.15930450e-09,   1.63894777e-08,
         1.46785479e-08,   1.33205342e-08,   8.41414001e-09,
         5.13850532e-09,  -2.89545939e-08,   2.62474472e-08,
        -3.46392476e-08,  -1.57940013e-09,  -1.28923829e-08,
         1.31222180e-08,  -9.17113615e-10,  -1.00687818e-08,
        -3.89939722e-09,  -3.51985068e-08,   1.79476715e-08,
        -1.62131622e-08,   5.09965938e-09,   2.83437559e-09,
        -2.64933033e-08])
 message: 'Optimization terminated successfully.'
    nfev: 7
    nhev: 0
     nit: 6
    njev: 66
  status: 0
 success: True
       x: array([ 1.2727397 ,  1.18108864, -1.43166408, -0.17513002, -1.19281737,
       -0.45635879, -0.92465314,  0.62527143, -0.91742439, -0.3572384 ,
       -0.27470614, -0.29537757, -0.14388708, -2.01995993, -0.36553321,
       -0.61555769, -0.27778492, -0.32738055,  0.12400756, -0.05098998,
       -0.04473167,  0.01556612, -1.45815795, -0.20600495, -0.29243259,
       -0.24218752,  0.02777155, -1.04320388])

Forecast

final_theta = res.x
y_pred = predict(X, final_theta)

print(classification_report(y, y_pred))
              precision    recall  f1-score   support

           0       0.90      0.75      0.82        60
           1       0.78      0.91      0.84        58

   micro avg       0.83      0.83      0.83       118
   macro avg       0.84      0.83      0.83       118
weighted avg       0.84      0.83      0.83       118

Use different lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda (this is a constant)

Draw decision boundary

  • We find all the x satisfying x*theta=0Xtimestheta=0x*theta=0.
  • instead of solving polynomial equation, just create a coridate x,y grid that is dense enough, and find all those X×θX\times \thetaX×θ that is close enough to 0, then plot them
def draw_boundary(power, l):
#     """
#     power: polynomial power for mapped feature
#     l: lambda constant
#     """
    density = 1000
    threshhold = 2 * 10**-3

    final_theta = feature_mapped_logistic_regression(power, l)
    x, y = find_decision_boundary(density, power, final_theta, threshhold)

    df = pd.read_csv('ex2data2.txt', names=['test1', 'test2', 'accepted'])
    sns.lmplot('test1', 'test2', hue='accepted', data=df, size=6, fit_reg=False, scatter_kws={"s": 100})

    plt.scatter(x, y, c='R', s=10)
    plt.title('Decision boundary')
    plt.show()
def feature_mapped_logistic_regression(power, l):
#     """for drawing purpose only.. not a well generealize logistic regression
#     power: int
#         raise x1, x2 to polynomial power
#     l: int
#         lambda constant for regularization term
#     """
    df = pd.read_csv('ex2data2.txt', names=['test1', 'test2', 'accepted'])
    x1 = np.array(df.test1)
    x2 = np.array(df.test2)
    y = get_y(df)

    X = feature_mapping(x1, x2, power, as_ndarray=True)
    theta = np.zeros(X.shape[1])

    res = opt.minimize(fun=regularized_cost,
                       x0=theta,
                       args=(X, y, l),
                       method='TNC',
                       jac=regularized_gradient)
    final_theta = res.x

    return final_theta
def find_decision_boundary(density, power, theta, threshhold):
    t1 = np.linspace(-1, 1.5, density)  #1000 samples
    t2 = np.linspace(-1, 1.5, density)

    cordinates = [(x, y) for x in t1 for y in t2]
    x_cord, y_cord = zip(*cordinates)
    mapped_cord = feature_mapping(x_cord, y_cord, power)  # this is a dataframe

    inner_product = mapped_cord.as_matrix() @ theta

    decision = mapped_cord[np.abs(inner_product) < threshhold]

    return decision.f10, decision.f01
#Finding Decision Boundary Functions

Change the value of lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lambda lamb

draw_boundary(power=6, l=1)     #set lambda = 1

draw_boundary(power=6,l=0)  # set lambda < 0.1

draw_boundary(power=6, l=100)  # set lambda > 10

Keywords: Lambda Python network less

Added by robshanks on Sat, 17 Aug 2019 15:08:47 +0300