Article directory
Forward selection method and forward gradient method
. Both of them transform the vector operation in matrix into the vector operation in plane geometry.
Forward selection
Forward selection is a typical greedy algorithm.
the regression coefficient of linear model is usually solved by forward selection method. For a training set with m m m samples and n n n features of each sample, it is assumed that a linear model Y = ω TXY=\omega^TXY = ω TX can be fitted, where YYY is the vector of M * 1m*1m * 1, XXX is the matrix of M * nm*nm * n, and ω \ omega ω is the vector of n * 1n*1n * 1. The parameter omega ω of the model can be minimized by forward selection.
Cosine similarity projection
Q Q Q Q Q Q ̯
Y^=Xiωi
\hat{Y}=X_i\omega_i
Y^=Xiωi
8739co α < x i, Y > = | y * \ cos \ alpha < Xi, Y > = ∣ y ∣ cos α, α \ alpha is the angle between XIX IXI and YYY.
Therefore, Y^\hat{Y}Y ^ can be considered as the projection of YYY on Xixi.
I (i=1,2,i − 1,i+2,..., n) x I \ Quad (i=1,2,i-1,i+2, \ cdots, n) Xi(i=1,2,i − 1,i+2, n) In, select a new XIX? IXI that is closest to the residuals of yerry {err} YeRr and repeat the above process of projection and calculation of residuals until the residuals are 0. Stop the algorithm. You get omega Omega.
Give an example
# Illustrations import matplotlib.pyplot as plt from matplotlib.font_manager import FontProperties %matplotlib inline font = FontProperties(fname='/Library/Fonts/Heiti.ttc') # X1*w1 plt.annotate(xytext=(2, 5), xy=(8, 5), s='', color='r', arrowprops=dict(arrowstyle="->", color='r')) plt.text(6, 4.5, s='$X_1*\omega_1$', color='g') # X2*w2 plt.annotate(xytext=(8, 5), xy=(9.3, 7.5), s='', color='r', arrowprops=dict(arrowstyle="->", color='r')) plt.text(9.3, 7, s='$X_2*\omega_2$', color='g') # X1 plt.annotate(xytext=(2, 5), xy=(4, 5), s='', color='r', arrowprops=dict(arrowstyle="->", color='k')) plt.text(2.5, 4.5, s='$X_1$', color='g') # X2 plt.annotate(xytext=(2, 5), xy=(3, 7), s='', color='r', arrowprops=dict(arrowstyle="->", color='k')) plt.text(2, 6, s='$X_2$', color='g') # X2 plt.annotate(xytext=(8, 5), xy=(9, 7), s='', color='r', arrowprops=dict(arrowstyle="->", color='k')) plt.text(8.2, 6.5, s='$X_2$', color='g') # Y plt.annotate(xytext=(2, 5), xy=(8, 8), s='', color='r', arrowprops=dict(arrowstyle="->", color='k')) plt.text(5, 7.5, s='$Y$', color='g') # plt.annotate(xytext=(8, 5), xy=(8, 8), s='', color='r', arrowprops=dict(arrowstyle="-", color='gray')) plt.text(7.5, 6.5, s='$Y_1$', color='g') # plt.annotate(xytext=(8, 8), xy=(9.3, 7.5), s='', arrowprops=dict(arrowstyle="-", color='gray')) plt.text(8.5, 8, s='$Y_2$', color='g') plt.xlim(0, 11) plt.ylim(2, 10) plt.title('Examples of forward selection', fontproperties=font, fontsize=20) plt.show()
[the external link image transfer failed. The source station may have anti-theft chain mechanism, 6% 8b% A9% E6% B3% 95% E5% 92% 8C% E5% 89% 8D% E5% 90% 91% E6% A2% AF% E5% Ba% A6% E6% B3% 95_. PNG)]]
̯ ̯̯̯̯̯̹̯̯ ̯. Because there is only X2X ﹤ 2X2 left at present, then use the residual y1y ﹤ to project on X2X ﹤ 2X2 to get the red line X2 * ω 2X2 * \ Omega ﹤ 2X2 * ω 2. If not only X2X ﹤ 2X2, then select the XIX ﹤ IXI closest to y1y ﹤ 1y1. At this time, X1 ω 1+X2 ω 2x ﹣ 1 \ Omega ﹣ 1 + X ﹣ 2 \ Omega ﹣ 1+X2 ﹣ 2 simulates YYY, that is, ω = [ω 1, ω 2] \ omega = [\ Omega ﹣ 1,omega ﹣ 2] ω = [ω 1, ω 2].
Advantages and disadvantages of forward selection
Advantage
- The algorithm only does one operation for each Xixi, which is fast.
shortcoming
- Since the variables XIX ﹣ I are not orthogonal, so every time we must do projection to reduce the residual, so the forward selection method can only give a local approximate solution. (the following forward gradient method can be considered)
Forward gradient method
, n) x I \ Quad (i=1,2,i-1,i,i+1),, cdots, n) I n Xi(i=1,2,i − 1,i,i+1,..., n), select a vector xix{ixi that is closest to the residuals (erry {err} YeRr (Note: the calculation method of residuals is similar to the forward selection method), and then take another small step until the residuals are 0. Stop the algorithm to get omega ω.
Give an example
# Illustrations import matplotlib.pyplot as plt from matplotlib.font_manager import FontProperties %matplotlib inline font = FontProperties(fname='/Library/Fonts/Heiti.ttc') # X1 plt.annotate(xytext=(2, 5), xy=(3, 5), s='', color='r', arrowprops=dict(arrowstyle="->", color='r')) plt.text(2.4, 4.8, s='$\epsilon{X_1}$', color='g') # eX1 plt.annotate(xytext=(2, 5), xy=(4, 5), s='', color='r', arrowprops=dict(arrowstyle="->", color='r')) plt.text(3.2, 4.8, s='$\epsilon{X_1}$', color='g') # eX1 plt.annotate(xytext=(2, 5), xy=(5, 5), s='', color='r', arrowprops=dict(arrowstyle="->", color='r')) plt.text(4.2, 4.8, s='$\epsilon{X_1}$', color='g') # eX1 plt.annotate(xytext=(2, 5), xy=(2.8, 5), s='', color='r', arrowprops=dict(arrowstyle="->", color='k')) plt.text(1.9, 4.8, s='$X_1$', color='g') # eX1 plt.annotate(xytext=(6.1, 6.2), xy=(7, 6.2), s='', color='r', arrowprops=dict(arrowstyle="->", color='r')) plt.text(6.2, 6, s='$\epsilon{X_1}$', color='g') # ex2 plt.annotate(xytext=(5, 5), xy=(6.2, 6.2), s='', color='r', arrowprops=dict(arrowstyle="->", color='r')) plt.text(5.2, 5.8, s='$\epsilon{X_2}$', color='g') # X2 plt.annotate(xytext=(2, 5), xy=(3, 6), s='', color='r', arrowprops=dict(arrowstyle="->", color='k')) plt.text(2, 5.5, s='$X_2$', color='g') # X2 plt.annotate(xytext=(5, 5), xy=(6, 6), s='', color='r', arrowprops=dict(arrowstyle="->", color='k')) plt.text(5.6, 5.5, s='$X_2$', color='g') # Y plt.annotate(xytext=(2, 5), xy=(8, 7), s='', color='r', arrowprops=dict(arrowstyle="->", color='k')) plt.text(5, 6.2, s='$Y$', color='g') plt.annotate(xytext=(5, 5), xy=(8, 7), s='', color='r', arrowprops=dict(arrowstyle="-", color='gray')) plt.xlim(1, 9) plt.ylim(4, 8) plt.title('An example of forward gradient method', fontproperties=font, fontsize=20) plt.show()
[the external link image transfer failed. The source station may have anti-theft chain mechanism, % 8b% A9% E6% B3% 95% E5% 92% 8C% E5% 89% 8D% E5% 90% 91% E6% A2% AF% E5% Ba% A6% E6% B3% 95_. PNG)]]
Q ̯, assuming that XXX is a 222 dimension, it can be seen first that the closest to YYY is X1X x1, so continue to walk for a long time until it is a manually adjusted super parameter. After walking for a long time, it is found that the closest to the residual Erry {err} when the nearest err is X2X ﹤ 2, go up a distance along the direction of vector X2X ﹤ 2. It is found that the residual erry {err} is closer to X1X ﹤ 1x1. Then go along X1X ﹤ 1x1 for a distance until the final residual is 0. Stop the algorithm and you can get omega ω.
Advantages and disadvantages of forward gradient method
Advantage
- The size of ϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵ
shortcoming
- ϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵ. Similar to gradient descent, this is a big problem of forward gradient method. (refer to the minimum angle regression method)