A-05 forward selection method and forward gradient method

Article directory

Forward selection method and forward gradient method

                      . Both of them transform the vector operation in matrix into the vector operation in plane geometry.

Forward selection

Forward selection is a typical greedy algorithm.

   the regression coefficient of linear model is usually solved by forward selection method. For a training set with m m m samples and n n n features of each sample, it is assumed that a linear model Y = ω TXY=\omega^TXY = ω TX can be fitted, where YYY is the vector of M * 1m*1m * 1, XXX is the matrix of M * nm*nm * n, and ω \ omega ω is the vector of n * 1n*1n * 1. The parameter omega ω of the model can be minimized by forward selection.

Cosine similarity projection

                     Q       Q  Q Q  Q Q ̯  ⁧  
Y^=Xiωi \hat{Y}=X_i\omega_i Y^=Xi​ωi​
8739co ⁡ α < x ⁡ i, Y > = | y * \ cos \ alpha < Xi, Y > = ∣ y ∣ cos α, α \ alpha is the angle between XIX ⁡ IXI and YYY.

Therefore, Y^\hat{Y}Y ^ can be considered as the projection of YYY on Xixi.

I (i=1,2,i − 1,i+2,..., n) x I \ Quad (i=1,2,i-1,i+2, \ cdots, n) Xi(i=1,2,i − 1,i+2, n) In, select a new XIX? IXI that is closest to the residuals of yerry {err} YeRr and repeat the above process of projection and calculation of residuals until the residuals are 0. Stop the algorithm. You get omega Omega.

Give an example

# Illustrations
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
%matplotlib inline
font = FontProperties(fname='/Library/Fonts/Heiti.ttc')

# X1*w1
plt.annotate(xytext=(2, 5), xy=(8, 5), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='r'))
plt.text(6, 4.5, s='$X_1*\omega_1$', color='g')
# X2*w2
plt.annotate(xytext=(8, 5), xy=(9.3, 7.5), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='r'))
plt.text(9.3, 7, s='$X_2*\omega_2$', color='g')
# X1
plt.annotate(xytext=(2, 5), xy=(4, 5), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='k'))
plt.text(2.5, 4.5, s='$X_1$', color='g')
# X2
plt.annotate(xytext=(2, 5), xy=(3, 7), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='k'))
plt.text(2, 6, s='$X_2$', color='g')
# X2
plt.annotate(xytext=(8, 5), xy=(9, 7), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='k'))
plt.text(8.2, 6.5, s='$X_2$', color='g')
# Y
plt.annotate(xytext=(2, 5), xy=(8, 8), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='k'))
plt.text(5, 7.5, s='$Y$', color='g')
#
plt.annotate(xytext=(8, 5), xy=(8, 8), s='', color='r',
             arrowprops=dict(arrowstyle="-", color='gray'))
plt.text(7.5, 6.5, s='$Y_1$', color='g')
#
plt.annotate(xytext=(8, 8), xy=(9.3, 7.5), s='',
             arrowprops=dict(arrowstyle="-", color='gray'))
plt.text(8.5, 8, s='$Y_2$', color='g')

plt.xlim(0, 11)
plt.ylim(2, 10)
plt.title('Examples of forward selection', fontproperties=font, fontsize=20)
plt.show()

[the external link image transfer failed. The source station may have anti-theft chain mechanism, 6% 8b% A9% E6% B3% 95% E5% 92% 8C% E5% 89% 8D% E5% 90% 91% E6% A2% AF% E5% Ba% A6% E6% B3% 95_. PNG)]]

                     ̯     ̯̯̯̯̯̹̯̯ ̯. Because there is only X2X ﹤ 2X2 left at present, then use the residual y1y ﹤ to project on X2X ﹤ 2X2 to get the red line X2 * ω 2X2 * \ Omega ﹤ 2X2 * ω 2. If not only X2X ﹤ 2X2, then select the XIX ﹤ IXI closest to y1y ﹤ 1y1. At this time, X1 ω 1+X2 ω 2x ﹣ 1 \ Omega ﹣ 1 + X ﹣ 2 \ Omega ﹣ 1+X2 ﹣ 2 simulates YYY, that is, ω = [ω 1, ω 2] \ omega = [\ Omega ﹣ 1,omega ﹣ 2] ω = [ω 1, ω 2].

Advantages and disadvantages of forward selection

Advantage

  1. The algorithm only does one operation for each Xixi, which is fast.

shortcoming

  1. Since the variables XIX ﹣ I are not orthogonal, so every time we must do projection to reduce the residual, so the forward selection method can only give a local approximate solution. (the following forward gradient method can be considered)

Forward gradient method


                              , n) x I \ Quad (i=1,2,i-1,i,i+1),, cdots, n) I n Xi(i=1,2,i − 1,i,i+1,..., n), select a vector xix{ixi that is closest to the residuals (erry {err} YeRr (Note: the calculation method of residuals is similar to the forward selection method), and then take another small step until the residuals are 0. Stop the algorithm to get omega ω.

Give an example

# Illustrations
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
%matplotlib inline
font = FontProperties(fname='/Library/Fonts/Heiti.ttc')

# X1
plt.annotate(xytext=(2, 5), xy=(3, 5), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='r'))
plt.text(2.4, 4.8, s='$\epsilon{X_1}$', color='g')
# eX1
plt.annotate(xytext=(2, 5), xy=(4, 5), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='r'))
plt.text(3.2, 4.8, s='$\epsilon{X_1}$', color='g')
# eX1
plt.annotate(xytext=(2, 5), xy=(5, 5), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='r'))
plt.text(4.2, 4.8, s='$\epsilon{X_1}$', color='g')
# eX1
plt.annotate(xytext=(2, 5), xy=(2.8, 5), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='k'))
plt.text(1.9, 4.8, s='$X_1$', color='g')
# eX1
plt.annotate(xytext=(6.1, 6.2), xy=(7, 6.2), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='r'))
plt.text(6.2, 6, s='$\epsilon{X_1}$', color='g')

# ex2
plt.annotate(xytext=(5, 5), xy=(6.2, 6.2), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='r'))
plt.text(5.2, 5.8, s='$\epsilon{X_2}$', color='g')
# X2
plt.annotate(xytext=(2, 5), xy=(3, 6), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='k'))
plt.text(2, 5.5, s='$X_2$', color='g')
# X2
plt.annotate(xytext=(5, 5), xy=(6, 6), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='k'))
plt.text(5.6, 5.5, s='$X_2$', color='g')

# Y
plt.annotate(xytext=(2, 5), xy=(8, 7), s='', color='r',
             arrowprops=dict(arrowstyle="->", color='k'))
plt.text(5, 6.2, s='$Y$', color='g')

plt.annotate(xytext=(5, 5), xy=(8, 7), s='', color='r',
             arrowprops=dict(arrowstyle="-", color='gray'))

plt.xlim(1, 9)
plt.ylim(4, 8)
plt.title('An example of forward gradient method', fontproperties=font, fontsize=20)
plt.show()

[the external link image transfer failed. The source station may have anti-theft chain mechanism, % 8b% A9% E6% B3% 95% E5% 92% 8C% E5% 89% 8D% E5% 90% 91% E6% A2% AF% E5% Ba% A6% E6% B3% 95_. PNG)]]

                             Q ̯, assuming that XXX is a 222 dimension, it can be seen first that the closest to YYY is X1X x1, so continue to walk for a long time until it is a manually adjusted super parameter. After walking for a long time, it is found that the closest to the residual Erry {err} when the nearest err is X2X ﹤ 2, go up a distance along the direction of vector X2X ﹤ 2. It is found that the residual erry {err} is closer to X1X ﹤ 1x1. Then go along X1X ﹤ 1x1 for a distance until the final residual is 0. Stop the algorithm and you can get omega ω.

Advantages and disadvantages of forward gradient method

Advantage

  1. The size of ϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵ

shortcoming

  1. ϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵ. Similar to gradient descent, this is a big problem of forward gradient method. (refer to the minimum angle regression method)

Published 174 original articles, praised 0, visited 4012
Private letter follow

Added by mpirvul on Thu, 05 Mar 2020 14:43:08 +0200