# machine learning in action machine learning algorithm learning notes support vector machine

#### Support Vector Machine

Mathematical proof

Pre knowledge: Lagrange number multiplication, dual problem, kernel technique

##### Lagrange number multiplication

For constrained optimization problems:

Example:

Known x>0,y>0,x+2y+2xy=8,be x+2y Minimum value of__.


Solution:

Introduction parameters λ \lambda λ Construct new function L: x + 2 y + λ ( x + 2 y + 2 x y − 8 ) x+2y+\lambda(x+2y+2xy-8) x+2y+λ(x+2y+2xy−8)

For x,y, λ \lambda λ Partial derivation:
L x = 1 + λ ( 1 + 2 y ) = 0 L y = 2 + λ ( 2 + 2 x ) = 0        L λ = x + 2 y + 2 x y − 8 = 0 L_x = 1+\lambda(1+2y)=0\\ L_y = 2+\lambda(2+2x)=0\\ \ \ \ \ \ \ L_\lambda = x+2y+2xy-8=0\\ Lx​=1+λ(1+2y)=0Ly​=2+λ(2+2x)=0      Lλ​=x+2y+2xy−8=0
Three equations and three unknowns, x = 2 and y = 1 can be obtained.

That is, when x = 2 and y = 1, the minimum value of x+2y is 4.

##### Dual problem

Conversion for optimization problems

For example: maxmin - > minmax

The default constrained optimization problem is a weak dual relationship. When the KNN condition is satisfied, it has a strong dual relationship, that is, they are equivalent.

##### Nuclear technique

For the expansion of high-dimensional features, when the number of samples or hyperplane dimension is too large, the kernel technique can be used to optimize and transform the problem into a finite dimensional problem.

##### Machine Learning action

This chapter studies the most popular implementation of support vector machines - sequential minimum optimization algorithm.

Advantages: low generalization error rate, low computational overhead and easy to understand results
Disadvantages: it is sensitive to parameter adjustment and function selection. The original classifier is only suitable for binary classification without modification.
Applicable data types: numerical and nominal data


Principle part

Split hyperplane set is the plane that divides different types of data points. Support vector machine is a classifier composed of these split hyperplanes.

**Support vector refers to those points closest to the separating hyperplane. Maximizing the interval of support vector is the optimization goal of constructing support vector machine

In the figure, a, b and C can all be considered as a segmentation hyperplane. Obviously, the robustness, i.e. generalization ability, of hyperplane b is better than that of a and C.

How to establish a mathematical model to find a high-quality hypersegmentation plane?

Then we need to define an optimization goal, that is, to maximize the spacing of those closest to the hyperdivided surface.
a r g   m a x w , b   { m i n ( w T ∗ x + b ) } arg\ max_{w,b}\ \{ min(w^T*x+b) \} arg maxw,b​ {min(wT∗x+b)}
Among them, W and B are the parameters to be optimized. If it is very difficult to directly solve the model, a series of model transformations are started. Those interested can further study in combination with the video at the beginning of the article and the watermelon book. Here we only comb the process.

• Lagrange equations are obtained by Lagrange multiplier method
• The Lagrange function is transformed into a r g   m i n m a x arg\ min max arg minmax problem
• Using the strong duality condition (KKT condition) of duality problem a r g   m i n m a x arg\ minmax arg minmax is converted to a r g   m a x m i n arg\ maxmin arg maxmin problem
• Because the sample space is not necessarily linearly separable, the relaxation variable is added

At the same time, the parameter of the problem becomes α \alpha α The next step is to solve the problem

The method used in machine learning practice is SMO algorithm.

SMO algorithm pseudo code:

Create a alpha Vector and initialize it to a 0 vector
When the number of iterations is less than the maximum number of iterations(External circulation):
For each data vector in the dataset(Internal circulation):
If the data vector can be optimized:
Randomly select another data vector
Optimize both vectors at the same time
If neither vector can be optimized, exit the inner loop
If all vectors are not optimized, increase the number of iterations and continue the next cycle


In fact, SMO algorithm is a more effective greedy algorithm.

code

'''
Created on Nov 4, 2010
Chapter 5 source file for Machine Learing in Action
@author: Peter
'''
from numpy import *
from time import sleep

dataMat = []
labelMat = []
fr = open(fileName)
lineArr = line.strip().split('\t')
dataMat.append([float(lineArr[0]), float(lineArr[1])])
labelMat.append(float(lineArr[2]))
return dataMat, labelMat

def selectJrand(i, m):
j = i  # we want to select any J not equal to i
while (j == i):
j = int(random.uniform(0, m))
return j

def clipAlpha(aj, H, L):
if aj > H:
aj = H
if L > aj:
aj = L
return aj

def smoSimple(dataMatIn, classLabels, C, toler, maxIter):
dataMatrix = mat(dataMatIn)
labelMat = mat(classLabels).transpose()
b = 0
m, n = shape(dataMatrix)
alphas = mat(zeros((m, 1)))
iter = 0
while (iter < maxIter):
alphaPairsChanged = 0
for i in range(m):
fXi = float(multiply(alphas, labelMat).T * (dataMatrix * dataMatrix[i, :].T)) + b
Ei = fXi - float(labelMat[i])  # if checks if an example violates KKT conditions
if ((labelMat[i] * Ei < -toler) and (alphas[i] < C)) or ((labelMat[i] * Ei > toler) and (alphas[i] > 0)):
j = selectJrand(i, m)
fXj = float(multiply(alphas, labelMat).T * (dataMatrix * dataMatrix[j, :].T)) + b
Ej = fXj - float(labelMat[j])
alphaIold = alphas[i].copy()
alphaJold = alphas[j].copy()
if (labelMat[i] != labelMat[j]):
L = max(0, alphas[j] - alphas[i])
H = min(C, C + alphas[j] - alphas[i])
else:
L = max(0, alphas[j] + alphas[i] - C)
H = min(C, alphas[j] + alphas[i])
if L == H:
print("L==H")
continue
eta = 2.0 * dataMatrix[i, :] * dataMatrix[j, :].T - dataMatrix[i, :] * dataMatrix[i, :].T - dataMatrix[
j,
:] * dataMatrix[
j, :].T
if eta >= 0 :
print("eta>=0")
continue
alphas[j] -= labelMat[j] * (Ei - Ej) / eta
alphas[j] = clipAlpha(alphas[j], H, L)
if (abs(alphas[j] - alphaJold) < 0.00001):
print("j not moving enough")
continue
alphas[i] += labelMat[j] * labelMat[i] * (alphaJold - alphas[j])  # update i by the same amount as j
# the update is in the oppostie direction
b1 = b - Ei - labelMat[i] * (alphas[i] - alphaIold) * dataMatrix[i, :] * dataMatrix[i, :].T - labelMat[
j] * (alphas[j] - alphaJold) * dataMatrix[i, :] * dataMatrix[j, :].T
b2 = b - Ej - labelMat[i] * (alphas[i] - alphaIold) * dataMatrix[i, :] * dataMatrix[j, :].T - labelMat[
j] * (alphas[j] - alphaJold) * dataMatrix[j, :] * dataMatrix[j, :].T
if (0 < alphas[i]) and (C > alphas[i]):
b = b1
elif (0 < alphas[j]) and (C > alphas[j]):
b = b2
else:
b = (b1 + b2) / 2.0
alphaPairsChanged += 1
print("iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
if (alphaPairsChanged == 0):
iter += 1
else:
iter = 0
print("iteration number: %d" % iter)
return b, alphas

def kernelTrans(X, A, kTup):  # calc the kernel or transform data to a higher dimensional space
m, n = shape(X)
K = mat(zeros((m, 1)))
if kTup[0] == 'lin':
K = X * A.T  # linear kernel
elif kTup[0] == 'rbf':
for j in range(m):
deltaRow = X[j, :] - A
K[j] = deltaRow * deltaRow.T
K = exp(K / (-1 * kTup[1] ** 2))  # divide in NumPy is element-wise not matrix like Matlab
else:
raise NameError('Houston We Have a Problem -- \
That Kernel is not recognized')
return K

class optStruct:
def __init__(self, dataMatIn, classLabels, C, toler, kTup):  # Initialize the structure with the parameters
self.X = dataMatIn
self.labelMat = classLabels
self.C = C
self.tol = toler
self.m = shape(dataMatIn)[0]
self.alphas = mat(zeros((self.m, 1)))
self.b = 0
self.eCache = mat(zeros((self.m, 2)))  # first column is valid flag
self.K = mat(zeros((self.m, self.m)))
for i in range(self.m):
self.K[:, i] = kernelTrans(self.X, self.X[i, :], kTup)

def calcEk(oS, k):
fXk = float(multiply(oS.alphas, oS.labelMat).T * oS.K[:, k] + oS.b)
Ek = fXk - float(oS.labelMat[k])
return Ek

def selectJ(i, oS, Ei):  # this is the second choice -heurstic, and calcs Ej
maxK = -1
maxDeltaE = 0
Ej = 0
oS.eCache[i] = [1, Ei]  # set valid #choose the alpha that gives the maximum delta E
validEcacheList = nonzero(oS.eCache[:, 0].A)[0]
if (len(validEcacheList)) > 1:
for k in validEcacheList:  # loop through valid Ecache values and find the one that maximizes delta E
if k == i: continue  # don't calc for i, waste of time
Ek = calcEk(oS, k)
deltaE = abs(Ei - Ek)
if (deltaE > maxDeltaE):
maxK = k
maxDeltaE = deltaE
Ej = Ek
return maxK, Ej
else:  # in this case (first time around) we don't have any valid eCache values
j = selectJrand(i, oS.m)
Ej = calcEk(oS, j)
return j, Ej

def updateEk(oS, k):  # after any alpha has changed update the new value in the cache
Ek = calcEk(oS, k)
oS.eCache[k] = [1, Ek]

def innerL(i, oS):
Ei = calcEk(oS, i)
if ((oS.labelMat[i] * Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or (
(oS.labelMat[i] * Ei > oS.tol) and (oS.alphas[i] > 0)):
j, Ej = selectJ(i, oS, Ei)  # this has been changed from selectJrand
alphaIold = oS.alphas[i].copy()
alphaJold = oS.alphas[j].copy()
if (oS.labelMat[i] != oS.labelMat[j]):
L = max(0, oS.alphas[j] - oS.alphas[i])
H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i])
else:
L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C)
H = min(oS.C, oS.alphas[j] + oS.alphas[i])
if L == H:
print("L==H")
return 0
eta = 2.0 * oS.K[i, j] - oS.K[i, i] - oS.K[j, j]  # changed for kernel
if eta >= 0:
print("eta>=0")
return 0
oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta
oS.alphas[j] = clipAlpha(oS.alphas[j], H, L)
updateEk(oS, j)  # added this for the Ecache
if (abs(oS.alphas[j] - alphaJold) < 0.00001):
print("j not moving enough")
return 0
oS.alphas[i] += oS.labelMat[j] * oS.labelMat[i] * (alphaJold - oS.alphas[j])  # update i by the same amount as j
updateEk(oS, i)  # added this for the Ecache                    #the update is in the oppostie direction
b1 = oS.b - Ei - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.K[i, i] - oS.labelMat[j] * (
oS.alphas[j] - alphaJold) * oS.K[i, j]
b2 = oS.b - Ej - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.K[i, j] - oS.labelMat[j] * (
oS.alphas[j] - alphaJold) * oS.K[j, j]
if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]):
oS.b = b1
elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]):
oS.b = b2
else:
oS.b = (b1 + b2) / 2.0
return 1
else:
return 0

def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup=('lin', 0)):  # full Platt SMO
oS = optStruct(mat(dataMatIn), mat(classLabels).transpose(), C, toler, kTup)
iter = 0
entireSet = True
alphaPairsChanged = 0
while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
alphaPairsChanged = 0
if entireSet:  # go over all
for i in range(oS.m):
alphaPairsChanged += innerL(i, oS)
print("fullSet, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
iter += 1
else:  # go over non-bound (railed) alphas
nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0]
for i in nonBoundIs:
alphaPairsChanged += innerL(i, oS)
print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
iter += 1
if entireSet:
entireSet = False  # toggle entire set loop
elif (alphaPairsChanged == 0):
entireSet = True
print("iteration number: %d" % iter)
return oS.b, oS.alphas

def calcWs(alphas, dataArr, classLabels):
X = mat(dataArr)
labelMat = mat(classLabels).transpose()
m, n = shape(X)
w = zeros((n, 1))
for i in range(m):
w += multiply(alphas[i] * labelMat[i], X[i, :].T)
return w

def testRbf(k1=1.3):
b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, ('rbf', k1))  # C=200 important
datMat = mat(dataArr)
labelMat = mat(labelArr).transpose()
svInd = nonzero(alphas.A > 0)[0]
sVs = datMat[svInd]  # get matrix of only support vectors
labelSV = labelMat[svInd]
print("there are %d Support Vectors" % shape(sVs)[0])
m, n = shape(datMat)
errorCount = 0
for i in range(m):
kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1))
predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b
if sign(predict) != sign(labelArr[i]): errorCount += 1
print("the training error rate is: %f" % (float(errorCount) / m))
errorCount = 0
datMat = mat(dataArr)
labelMat = mat(labelArr).transpose()
m, n = shape(datMat)
for i in range(m):
kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1))
predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b
if sign(predict) != sign(labelArr[i]): errorCount += 1
print("the test error rate is: %f" % (float(errorCount) / m))

def img2vector(filename):
returnVect = zeros((1, 1024))
fr = open(filename)
for i in range(32):
for j in range(32):
returnVect[0, 32 * i + j] = int(lineStr[j])
return returnVect

from os import listdir
hwLabels = []
trainingFileList = listdir(dirName)  # load the training set
m = len(trainingFileList)
trainingMat = zeros((m, 1024))
for i in range(m):
fileNameStr = trainingFileList[i]
fileStr = fileNameStr.split('.')[0]  # take off .txt
classNumStr = int(fileStr.split('_')[0])
if classNumStr == 9:
hwLabels.append(-1)
else:
hwLabels.append(1)
trainingMat[i, :] = img2vector('%s/%s' % (dirName, fileNameStr))
return trainingMat, hwLabels

def testDigits(kTup=('rbf', 10)):
b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, kTup)
datMat = mat(dataArr)
labelMat = mat(labelArr).transpose()
svInd = nonzero(alphas.A > 0)[0]
sVs = datMat[svInd]
labelSV = labelMat[svInd]
print("there are %d Support Vectors" % shape(sVs)[0])
m, n = shape(datMat)
errorCount = 0
for i in range(m):
kernelEval = kernelTrans(sVs, datMat[i, :], kTup)
predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b
if sign(predict) != sign(labelArr[i]): errorCount += 1
print("the training error rate is: %f" % (float(errorCount) / m))
errorCount = 0
datMat = mat(dataArr)
labelMat = mat(labelArr).transpose()
m, n = shape(datMat)
for i in range(m):
kernelEval = kernelTrans(sVs, datMat[i, :], kTup)
predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b
if sign(predict) != sign(labelArr[i]): errorCount += 1
print("the test error rate is: %f" % (float(errorCount) / m))

'''#######********************************
Non-Kernel VErsions below
'''  #######********************************

class optStructK:
def __init__(self, dataMatIn, classLabels, C, toler):  # Initialize the structure with the parameters
self.X = dataMatIn
self.labelMat = classLabels
self.C = C
self.tol = toler
self.m = shape(dataMatIn)[0]
self.alphas = mat(zeros((self.m, 1)))
self.b = 0
self.eCache = mat(zeros((self.m, 2)))  # first column is valid flag

def calcEkK(oS, k):
fXk = float(multiply(oS.alphas, oS.labelMat).T * (oS.X * oS.X[k, :].T)) + oS.b
Ek = fXk - float(oS.labelMat[k])
return Ek

def selectJK(i, oS, Ei):  # this is the second choice -heurstic, and calcs Ej
maxK = -1
maxDeltaE = 0
Ej = 0
oS.eCache[i] = [1, Ei]  # set valid #choose the alpha that gives the maximum delta E
validEcacheList = nonzero(oS.eCache[:, 0].A)[0]
if (len(validEcacheList)) > 1:
for k in validEcacheList:  # loop through valid Ecache values and find the one that maximizes delta E
if k == i: continue  # don't calc for i, waste of time
Ek = calcEk(oS, k)
deltaE = abs(Ei - Ek)
if (deltaE > maxDeltaE):
maxK = k
maxDeltaE = deltaE
Ej = Ek
return maxK, Ej
else:  # in this case (first time around) we don't have any valid eCache values
j = selectJrand(i, oS.m)
Ej = calcEk(oS, j)
return j, Ej

def updateEkK(oS, k):  # after any alpha has changed update the new value in the cache
Ek = calcEk(oS, k)
oS.eCache[k] = [1, Ek]

def innerLK(i, oS):
Ei = calcEk(oS, i)
if ((oS.labelMat[i] * Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or (
(oS.labelMat[i] * Ei > oS.tol) and (oS.alphas[i] > 0)):
j, Ej = selectJ(i, oS, Ei)  # this has been changed from selectJrand
alphaIold = oS.alphas[i].copy()
alphaJold = oS.alphas[j].copy()
if (oS.labelMat[i] != oS.labelMat[j]):
L = max(0, oS.alphas[j] - oS.alphas[i])
H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i])
else:
L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C)
H = min(oS.C, oS.alphas[j] + oS.alphas[i])
if L == H:
print ("L==H")
return 0
eta = 2.0 * oS.X[i, :] * oS.X[j, :].T - oS.X[i, :] * oS.X[i, :].T - oS.X[j, :] * oS.X[j, :].T
if eta >= 0:
print("eta>=0")
return 0
oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta
oS.alphas[j] = clipAlpha(oS.alphas[j], H, L)
updateEk(oS, j)  # added this for the Ecache
if (abs(oS.alphas[j] - alphaJold) < 0.00001):
print("j not moving enough")
return 0
oS.alphas[i] += oS.labelMat[j] * oS.labelMat[i] * (alphaJold - oS.alphas[j])  # update i by the same amount as j
updateEk(oS, i)  # added this for the Ecache                    #the update is in the oppostie direction
b1 = oS.b - Ei - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.X[i, :] * oS.X[i, :].T - oS.labelMat[j] * (
oS.alphas[j] - alphaJold) * oS.X[i, :] * oS.X[j, :].T
b2 = oS.b - Ej - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.X[i, :] * oS.X[j, :].T - oS.labelMat[j] * (
oS.alphas[j] - alphaJold) * oS.X[j, :] * oS.X[j, :].T
if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]):
oS.b = b1
elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]):
oS.b = b2
else:
oS.b = (b1 + b2) / 2.0
return 1
else:
return 0

def smoPK(dataMatIn, classLabels, C, toler, maxIter):  # full Platt SMO
oS = optStruct(mat(dataMatIn), mat(classLabels).transpose(), C, toler)
iter = 0
entireSet = True
alphaPairsChanged = 0
while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
alphaPairsChanged = 0
if entireSet:  # go over all
for i in range(oS.m):
alphaPairsChanged += innerL(i, oS)
print("fullSet, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
iter += 1
else:  # go over non-bound (railed) alphas
nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0]
for i in nonBoundIs:
alphaPairsChanged += innerL(i, oS)
print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
iter += 1
if entireSet:
entireSet = False  # toggle entire set loop
elif (alphaPairsChanged == 0):
entireSet = True
print("iteration number: %d" % iter)
return oS.b, oS.alphas

import svm as svmMLiA
import numpy as np
import matplotlib.pyplot as plt

# Dataset, category label, constant C, fault tolerance, maximum number of cycles to exit
b,alphas= svmMLiA.smoSimple(dataArr,labelArr,0.6,0.001,40)

plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus'] = False
# There will be problems with the Chinese display in matplotlib drawing. You need to set the default font in these two lines
plt.xlabel('X')
plt.ylabel('Y')
plt.xlim(xmax=12,xmin=-2.5)
plt.ylim(ymax=10,ymin=-10)
# Draw two (0-9) coordinate axes and set the axis labels x, y
x1=list();x2=list()
y1=list();y2=list()
for j in range(len(labelArr)):
if labelArr[j]==1:
x1.append(dataArr[j][0])
y1.append(dataArr[j][1])
else :
x2.append(dataArr[j][0])
y2.append(dataArr[j][1])
print(x1)
print(y1)
colors1 = '#00CED1'
colors2 = '#DC143C'
area = np.pi*4**2

plt.scatter(x1,y1,s=area,c=colors1,alpha=0.4,label='category A')
plt.scatter(x2,y2,s=area,c=colors2,alpha=0.4,label='category B')
plt.plot()
plt.legend()
plt.savefig('1.png',dpi=300)
plt.show()


epilogue

• SVM is the product of rigorous mathematical proof.
• When the distribution is too single, it is difficult to extract the feature set.

Keywords: Algorithm Machine Learning

Added by arctushar on Sun, 06 Feb 2022 03:05:46 +0200