# Forward selection method and forward gradient method . Both of them transform the vector operation in matrix into the vector operation in plane geometry.

# Forward selection

Forward selection is a typical greedy algorithm.

the regression coefficient of linear model is usually solved by forward selection method. For a training set with m m m samples and n n n features of each sample, it is assumed that a linear model Y = ω TXY=\omega^TXY = ω TX can be fitted, where YYY is the vector of M * 1m*1m * 1, XXX is the matrix of M * nm*nm * n, and ω \ omega ω is the vector of n * 1n*1n * 1. The parameter omega ω of the model can be minimized by forward selection.

## Cosine similarity projection

Q       Q  Q Q  Q Q ̯  ⁧
Y^=Xiωi \hat{Y}=X_i\omega_i Y^=Xi​ωi​
8739co ⁡ α < x ⁡ i, Y > = | y * \ cos \ alpha < Xi, Y > = ∣ y ∣ cos α, α \ alpha is the angle between XIX ⁡ IXI and YYY.

Therefore, Y^\hat{Y}Y ^ can be considered as the projection of YYY on Xixi.

I (i=1,2,i − 1,i+2,..., n) x I \ Quad (i=1,2,i-1,i+2, \ cdots, n) Xi(i=1,2,i − 1,i+2, n) In, select a new XIX? IXI that is closest to the residuals of yerry {err} YeRr and repeat the above process of projection and calculation of residuals until the residuals are 0. Stop the algorithm. You get omega Omega.

## Give an example

# Illustrations
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
%matplotlib inline
font = FontProperties(fname='/Library/Fonts/Heiti.ttc')

# X1*w1
plt.annotate(xytext=(2, 5), xy=(8, 5), s='', color='r',
arrowprops=dict(arrowstyle="->", color='r'))
plt.text(6, 4.5, s='$X_1*\omega_1$', color='g')
# X2*w2
plt.annotate(xytext=(8, 5), xy=(9.3, 7.5), s='', color='r',
arrowprops=dict(arrowstyle="->", color='r'))
plt.text(9.3, 7, s='$X_2*\omega_2$', color='g')
# X1
plt.annotate(xytext=(2, 5), xy=(4, 5), s='', color='r',
arrowprops=dict(arrowstyle="->", color='k'))
plt.text(2.5, 4.5, s='$X_1$', color='g')
# X2
plt.annotate(xytext=(2, 5), xy=(3, 7), s='', color='r',
arrowprops=dict(arrowstyle="->", color='k'))
plt.text(2, 6, s='$X_2$', color='g')
# X2
plt.annotate(xytext=(8, 5), xy=(9, 7), s='', color='r',
arrowprops=dict(arrowstyle="->", color='k'))
plt.text(8.2, 6.5, s='$X_2$', color='g')
# Y
plt.annotate(xytext=(2, 5), xy=(8, 8), s='', color='r',
arrowprops=dict(arrowstyle="->", color='k'))
plt.text(5, 7.5, s='$Y$', color='g')
#
plt.annotate(xytext=(8, 5), xy=(8, 8), s='', color='r',
arrowprops=dict(arrowstyle="-", color='gray'))
plt.text(7.5, 6.5, s='$Y_1$', color='g')
#
plt.annotate(xytext=(8, 8), xy=(9.3, 7.5), s='',
arrowprops=dict(arrowstyle="-", color='gray'))
plt.text(8.5, 8, s='$Y_2$', color='g')

plt.xlim(0, 11)
plt.ylim(2, 10)
plt.title('Examples of forward selection', fontproperties=font, fontsize=20)
plt.show()


[the external link image transfer failed. The source station may have anti-theft chain mechanism, 6% 8b% A9% E6% B3% 95% E5% 92% 8C% E5% 89% 8D% E5% 90% 91% E6% A2% AF% E5% Ba% A6% E6% B3% 95_. PNG)]]

̯     ̯̯̯̯̯̹̯̯ ̯. Because there is only X2X ﹤ 2X2 left at present, then use the residual y1y ﹤ to project on X2X ﹤ 2X2 to get the red line X2 * ω 2X2 * \ Omega ﹤ 2X2 * ω 2. If not only X2X ﹤ 2X2, then select the XIX ﹤ IXI closest to y1y ﹤ 1y1. At this time, X1 ω 1+X2 ω 2x ﹣ 1 \ Omega ﹣ 1 + X ﹣ 2 \ Omega ﹣ 1+X2 ﹣ 2 simulates YYY, that is, ω = [ω 1, ω 2] \ omega = [\ Omega ﹣ 1,omega ﹣ 2] ω = [ω 1, ω 2].

## Advantages and disadvantages of forward selection

### Advantage

1. The algorithm only does one operation for each Xixi, which is fast.

### shortcoming

1. Since the variables XIX ﹣ I are not orthogonal, so every time we must do projection to reduce the residual, so the forward selection method can only give a local approximate solution. (the following forward gradient method can be considered)

# Forward gradient method , n) x I \ Quad (i=1,2,i-1,i,i+1),, cdots, n) I n Xi(i=1,2,i − 1,i,i+1,..., n), select a vector xix｛ixi that is closest to the residuals (erry {err} YeRr (Note: the calculation method of residuals is similar to the forward selection method), and then take another small step until the residuals are 0. Stop the algorithm to get omega ω.

## Give an example

# Illustrations
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
%matplotlib inline
font = FontProperties(fname='/Library/Fonts/Heiti.ttc')

# X1
plt.annotate(xytext=(2, 5), xy=(3, 5), s='', color='r',
arrowprops=dict(arrowstyle="->", color='r'))
plt.text(2.4, 4.8, s='$\epsilon{X_1}$', color='g')
# eX1
plt.annotate(xytext=(2, 5), xy=(4, 5), s='', color='r',
arrowprops=dict(arrowstyle="->", color='r'))
plt.text(3.2, 4.8, s='$\epsilon{X_1}$', color='g')
# eX1
plt.annotate(xytext=(2, 5), xy=(5, 5), s='', color='r',
arrowprops=dict(arrowstyle="->", color='r'))
plt.text(4.2, 4.8, s='$\epsilon{X_1}$', color='g')
# eX1
plt.annotate(xytext=(2, 5), xy=(2.8, 5), s='', color='r',
arrowprops=dict(arrowstyle="->", color='k'))
plt.text(1.9, 4.8, s='$X_1$', color='g')
# eX1
plt.annotate(xytext=(6.1, 6.2), xy=(7, 6.2), s='', color='r',
arrowprops=dict(arrowstyle="->", color='r'))
plt.text(6.2, 6, s='$\epsilon{X_1}$', color='g')

# ex2
plt.annotate(xytext=(5, 5), xy=(6.2, 6.2), s='', color='r',
arrowprops=dict(arrowstyle="->", color='r'))
plt.text(5.2, 5.8, s='$\epsilon{X_2}$', color='g')
# X2
plt.annotate(xytext=(2, 5), xy=(3, 6), s='', color='r',
arrowprops=dict(arrowstyle="->", color='k'))
plt.text(2, 5.5, s='$X_2$', color='g')
# X2
plt.annotate(xytext=(5, 5), xy=(6, 6), s='', color='r',
arrowprops=dict(arrowstyle="->", color='k'))
plt.text(5.6, 5.5, s='$X_2$', color='g')

# Y
plt.annotate(xytext=(2, 5), xy=(8, 7), s='', color='r',
arrowprops=dict(arrowstyle="->", color='k'))
plt.text(5, 6.2, s='$Y$', color='g')

plt.annotate(xytext=(5, 5), xy=(8, 7), s='', color='r',
arrowprops=dict(arrowstyle="-", color='gray'))

plt.xlim(1, 9)
plt.ylim(4, 8)
plt.title('An example of forward gradient method', fontproperties=font, fontsize=20)
plt.show()


[the external link image transfer failed. The source station may have anti-theft chain mechanism, % 8b% A9% E6% B3% 95% E5% 92% 8C% E5% 89% 8D% E5% 90% 91% E6% A2% AF% E5% Ba% A6% E6% B3% 95_. PNG)]]

Q ̯, assuming that XXX is a 222 dimension, it can be seen first that the closest to YYY is X1X x1, so continue to walk for a long time until it is a manually adjusted super parameter. After walking for a long time, it is found that the closest to the residual Erry {err} when the nearest err is X2X ﹤ 2, go up a distance along the direction of vector X2X ﹤ 2. It is found that the residual erry {err} is closer to X1X ﹤ 1x1. Then go along X1X ﹤ 1x1 for a distance until the final residual is 0. Stop the algorithm and you can get omega ω.

## Advantages and disadvantages of forward gradient method

### Advantage

1. The size of ϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵ

### shortcoming

1. ϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵϵ. Similar to gradient descent, this is a big problem of forward gradient method. (refer to the minimum angle regression method)   Published 174 original articles, praised 0, visited 4012

Added by mpirvul on Thu, 05 Mar 2020 14:43:08 +0200