Artificial intelligence Python implementation of polynomial regression

This article mainly introduces the implementation of polynomial regression by artificial intelligence Python. Last time, we explained linear regression. This time, we focus on polynomial regression. You can refer to it

catalogue

1. Overview

1.1 supervised learning

1.2 polynomial regression

2 Concept

3 case realization - Method 1

3.1 case analysis

3.2 code implementation

3.3 results

3.4 visualization

4 case realization - Method 2

4.1 code

4.2 results

4.3 visualization

1. Overview

1.1 supervised learning

1.2 polynomial regression

Last time we explained linear regression, this time we focus on polynomial regression.

Polynomial regression is a polynomial regression analysis method to study the relationship between a dependent variable and one or more independent variables. If there is only one independent variable, it is called univariate polynomial regression; If there are multiple independent variables, it is called multivariate polynomial regression.  

(1) In univariate regression analysis, if the relationship between dependent variable y and independent variable x is nonlinear, but no appropriate function curve can be found to fit, univariate polynomial regression can be used.
(2) The greatest advantage of polynomial regression is that the measured points can be approximated by adding the higher-order term of x until satisfactory.
(3) In fact, polynomial regression can deal with quite a class of nonlinear problems. It plays an important role in regression analysis, because any function can be approximated by polynomials.

2 Concept

In the linear regression example mentioned earlier, a straight line is used to fit the linear relationship between data input and output. Unlike linear regression, polynomial regression uses a curve to fit the mapping relationship between the input and output of data.

3 case realization - Method 1

3.1 case analysis

Application background: we have carried out linear regression according to the known house transaction price and house size, and then we can predict the transaction price of the example of known house size and unknown house transaction price. However, in practical application, such fitting is often not good enough, so we carry out polynomial regression on the data set.
Objective: to establish polynomial regression equation for house transaction information and predict house price according to the regression equation.

The transaction information includes the area of the house and the corresponding transaction price:

(1) The unit of house area is square feet (ft2)

(2) The unit of house transaction price is 10000

3.2 code implementation

import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures
  
  
# Read dataset
datasets_X = []
datasets_Y = []
fr = open('Polynomial linear regression.csv','r')
lines = fr.readlines()
for line in lines:
    items = line.strip().split(',')
    datasets_X.append(int(items[0]))
    datasets_Y.append(int(items[1]))
  
length = len(datasets_X)
datasets_X = np.array(datasets_X).reshape([length,1])
datasets_Y = np.array(datasets_Y)
  
minX = min(datasets_X)
maxX = max(datasets_X)
X = np.arange(minX,maxX).reshape([-1,1])
  
  
poly_reg = PolynomialFeatures(degree = 2)      #degree=2 indicates the establishment of datasets_ Quadratic polynomial characteristic x of X_poly. 
X_poly = poly_reg.fit_transform(datasets_X)    #Construct the quadratic polynomial x of x using PolynomialFeatures_ poly
lin_reg_2 = linear_model.LinearRegression()
lin_reg_2.fit(X_poly, datasets_Y)           #Then create a linear regression and use the linear_model to learn X_ Mapping relationship between poly and y
  
print(X_poly)
print(lin_reg_2.predict(poly_reg.fit_transform(X)))
print('Coefficients:', lin_reg_2.coef_)      #View regression equation coefficients (k)
print('intercept:', lin_reg_2.intercept_)    ##View intercept of regression equation (b)
print('the model is y={0}+({1}*x)+({2}*x^2)'.format(lin_reg_2.intercept_,lin_reg_2.coef_[0],lin_reg_2.coef_[1]))
# Display in image
plt.scatter(datasets_X, datasets_Y, color = 'red')  #scatter function is used to draw data points. Here, it means to draw data points in red;
#plot function is used to draw regression lines. Similarly, X needs to be processed into polynomial features first;
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')
plt.xlabel('Area')
plt.ylabel('Price')
plt.show()

  

3.3 results

[[1.0000000e+00 1.0000000e+03 1.0000000e+06]
 [1.0000000e+00 7.9200000e+02 6.2726400e+05]
 [1.0000000e+00 1.2600000e+03 1.5876000e+06]
 [1.0000000e+00 1.2620000e+03 1.5926440e+06]
 [1.0000000e+00 1.2400000e+03 1.5376000e+06]
 [1.0000000e+00 1.1700000e+03 1.3689000e+06]
 [1.0000000e+00 1.2300000e+03 1.5129000e+06]
 [1.0000000e+00 1.2550000e+03 1.5750250e+06]
 [1.0000000e+00 1.1940000e+03 1.4256360e+06]
 [1.0000000e+00 1.4500000e+03 2.1025000e+06]
 [1.0000000e+00 1.4810000e+03 2.1933610e+06]
 [1.0000000e+00 1.4750000e+03 2.1756250e+06]
 [1.0000000e+00 1.4820000e+03 2.1963240e+06]
 [1.0000000e+00 1.4840000e+03 2.2022560e+06]
 [1.0000000e+00 1.5120000e+03 2.2861440e+06]
 [1.0000000e+00 1.6800000e+03 2.8224000e+06]
 [1.0000000e+00 1.6200000e+03 2.6244000e+06]
 [1.0000000e+00 1.7200000e+03 2.9584000e+06]
 [1.0000000e+00 1.8000000e+03 3.2400000e+06]
 [1.0000000e+00 4.4000000e+03 1.9360000e+07]
 [1.0000000e+00 4.2120000e+03 1.7740944e+07]
 [1.0000000e+00 3.9200000e+03 1.5366400e+07]
 [1.0000000e+00 3.2120000e+03 1.0316944e+07]
 [1.0000000e+00 3.1510000e+03 9.9288010e+06]
 [1.0000000e+00 3.1000000e+03 9.6100000e+06]
 [1.0000000e+00 2.7000000e+03 7.2900000e+06]
 [1.0000000e+00 2.6120000e+03 6.8225440e+06]
 [1.0000000e+00 2.7050000e+03 7.3170250e+06]
 [1.0000000e+00 2.5700000e+03 6.6049000e+06]
 [1.0000000e+00 2.4420000e+03 5.9633640e+06]
 [1.0000000e+00 2.3870000e+03 5.6977690e+06]
 [1.0000000e+00 2.2920000e+03 5.2532640e+06]
 [1.0000000e+00 2.3080000e+03 5.3268640e+06]
 [1.0000000e+00 2.2520000e+03 5.0715040e+06]
 [1.0000000e+00 2.2020000e+03 4.8488040e+06]
 [1.0000000e+00 2.1570000e+03 4.6526490e+06]
 [1.0000000e+00 2.1400000e+03 4.5796000e+06]
 [1.0000000e+00 4.0000000e+03 1.6000000e+07]
 [1.0000000e+00 4.2000000e+03 1.7640000e+07]
 [1.0000000e+00 3.9000000e+03 1.5210000e+07]
 [1.0000000e+00 3.5440000e+03 1.2559936e+07]
 [1.0000000e+00 2.9800000e+03 8.8804000e+06]
 [1.0000000e+00 4.3550000e+03 1.8966025e+07]
 [1.0000000e+00 3.1500000e+03 9.9225000e+06]
 [1.0000000e+00 3.0250000e+03 9.1506250e+06]
 [1.0000000e+00 3.4500000e+03 1.1902500e+07]
 [1.0000000e+00 4.4020000e+03 1.9377604e+07]
 [1.0000000e+00 3.4540000e+03 1.1930116e+07]
 [1.0000000e+00 8.9000000e+02 7.9210000e+05]]
[231.16788093 231.19868474 231.22954958 ... 739.2018995  739.45285011
 739.70386176]
Coefficients: [ 0.00000000e+00 -1.75650177e-02  3.05166076e-05]
intercept: 225.93740561055927
the model is y=225.93740561055927+(0.0*x)+(-0.017565017675036532*x^2)

3.4 visualization

4 case realization - Method 2

4.1 code

import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
import pandas as pd
import warnings
  
warnings.filterwarnings(action="ignore", module="sklearn")
  
dataset = pd.read_csv('Polynomial linear regression.csv')
X = np.asarray(dataset.get('x'))
y = np.asarray(dataset.get('y'))
  
# Divide training set and test set
X_train = X[:-2]
X_test = X[-2:]
y_train = y[:-2]
y_test = y[-2:]
  
# fit_intercept is True
model1 = Pipeline([('poly', PolynomialFeatures(degree=2)), ('linear', LinearRegression(fit_intercept=True))])
model1 = model1.fit(X_train[:, np.newaxis], y_train)
y_test_pred1 = model1.named_steps['linear'].intercept_ + model1.named_steps['linear'].coef_[1] * X_test
print('while fit_intercept is True:................')
print('Coefficients: ', model1.named_steps['linear'].coef_)
print('Intercept:', model1.named_steps['linear'].intercept_)
print('the model is: y = ', model1.named_steps['linear'].intercept_, ' + ', model1.named_steps['linear'].coef_[1],
      '* X')
# Mean square error
print("Mean squared error: %.2f" % mean_squared_error(y_test, y_test_pred1))
# r2 score, between 0 and 1, the closer it is to 1, the better the model, and the closer it is to 0, the worse the model
print('Variance score: %.2f' % r2_score(y_test, y_test_pred1), '\n')
  
# fit_intercept is False
model2 = Pipeline([('poly', PolynomialFeatures(degree=2)), ('linear', LinearRegression(fit_intercept=False))])
model2 = model2.fit(X_train[:, np.newaxis], y_train)
y_test_pred2 = model2.named_steps['linear'].coef_[0] + model2.named_steps['linear'].coef_[1] * X_test + \
               model2.named_steps['linear'].coef_[2] * X_test * X_test
print('while fit_intercept is False:..........................................')
print('Coefficients: ', model2.named_steps['linear'].coef_)
print('Intercept:', model2.named_steps['linear'].intercept_)
print('the model is: y = ', model2.named_steps['linear'].coef_[0], '+', model2.named_steps['linear'].coef_[1], '* X + ',
      model2.named_steps['linear'].coef_[2], '* X^2')
# Mean square error
print("Mean squared error: %.2f" % mean_squared_error(y_test, y_test_pred2))
# r2 score, between 0 and 1, the closer it is to 1, the better the model, and the closer it is to 0, the worse the model
print('Variance score: %.2f' % r2_score(y_test, y_test_pred2), '\n')
  
plt.xlabel('x')
plt.ylabel('y')
# Draw a scatter diagram of the training set
plt.scatter(X_train, y_train, alpha=0.8, color='black')
# Draw model
plt.plot(X_train, model2.named_steps['linear'].coef_[0] + model2.named_steps['linear'].coef_[1] * X_train +
         model2.named_steps['linear'].coef_[2] * X_train * X_train, color='red',
         linewidth=1)
plt.show()

  

4.2 results

If you do not use the framework, you need to manually add high-order items to the data. With the framework, it is much more convenient. sklearn uses the {Pipeline} function to simplify this part of the preprocessing process.

When degree=1 in , PolynomialFeatures , the effect is the same as using , LinearRegression , and a linear model is obtained. When degree=2, it is a quadratic equation. If it is a univariate, it is a parabola, and if it is a bivariate, it is a paraboloid. and so on.

Here's a} fit_ The intercept parameter. Let's take a look at its function through an example.

When {fit_ coef when intercept = True_ The first value in is 0, intercept_ The value in is the actual intercept.

When fit_ coef when intercept is False_ The first value in is intercept_ The value in is 0.

As shown in the figure, the first part is fit_ When intercept is True, the second part is "fit"_ The result when intercept is False.

4.3 visualization

while fit_intercept is True:................
Coefficients:  [ 0.00000000e+00 -3.70858180e-04  2.78609637e-05]
Intercept: 204.25470490804574
the model is: y =  204.25470490804574  +  -0.00037085818009180454 * X
Mean squared error: 26964.95
Variance score: -3.61 
  
while fit_intercept is False:..........................................
Coefficients:  [ 2.04254705e+02 -3.70858180e-04  2.78609637e-05]
Intercept: 0.0
the model is: y =  204.2547049080572 + -0.0003708581801012066 * X +  2.7860963722809286e-05 * X^2
Mean squared error: 7147.78
Variance score: -0.22 

The above is the whole content of this article. I hope it can help you.

Keywords: Python

Added by titaniumtommy on Sat, 15 Jan 2022 18:55:38 +0200