SVM in machine learning practice

1, Linear SVM

1.1 maximum spacing and classification

Find a hyperplane in the sample space to separate different types of samples. At the same time, this line should be as far away from the sample points as possible. Then, this line is in the middle of the two sample points at the edge, and the sample points at these edges are called support vectors.

To find the maximum interval is to find the distance between two points. The distance between the positive and negative sample support vectors is equal to twice the distance from the positive sample to the classification line.

Where, molecule = 1
Get distance:

Finally, the problem becomes a solution
If the absolute value of the points of the above known support vectors is 1, the points after them should be > = 1
The following formula is easy to prove: if the classification is correct, then the value of yi is in the same direction as the value to be entered.

Convert to minimum

1.2 dual problem

The Lagrange multiplier method is used to find the minimum value under constraints


When SVM solves the w minimum value, the following steps are obtained:



In the end, there are only unknowns left α i

1.3 SMO algorithm flow



Finally solve α

2, Nonlinear SVM

2.1 kernel functions and kernel techniques

When the linearity is inseparable, a kernel function is selected to map it to a high-dimensional space to make it linearly separable

Among them, ϕ Yes, let the vector after sample x mapping be ϕ (x) , the partition hyperplane is f(x)=wT ϕ (x)+b

In this way, solving the problem is divided into two steps:
Firstly, a nonlinear mapping is used to transform the data into a feature space F;
Then, linear learner is used in feature space.
Common kernel functions are as follows:

If there is a method, the inner product can be calculated directly in the feature space< ϕ (x_i), ϕ (x) >, just as in the function of the original input point, it is possible to integrate the two steps to establish a sub linear learner. The direct calculation method is called kernel function method.
The method of replacing inner product with kernel function is called kernel trick.

2.3 soft interval and regularization

In practical application, it is difficult to select an appropriate kernel function to make the samples linearly separable in the feature space
Points; In addition, it is difficult to determine whether the linearly separable results are caused by over fitting. Therefore, the concept of "soft interval" is introduced to allow SVM to not meet the constraints on some samples.


While solving the maximum interval, the samples that do not meet the constraints should be as few as possible, and finally turn to the solution

Where C > 0 is the penalty parameter

However, the loss function has some defects: nonconvex, discontinuous and difficult to optimize, so it is used to replace the loss function.

2.4 support vector regression

Support vector regression allows 2 between model output and actual output ε Deviation of

Among them, ε Is a relaxation variable, which reflects the tolerance of outliers

So the solution becomes:

3, Code practice

3.1 data preparation

It is the same as the dataset selected by logistic last time
The data set comes from the beast under sklearn_ Cancer, the breast cancer dataset.
There are 569 data in total, and there are two results: 0 is malignant and 1 is benign
There are 30 features.
This time, only the first 100 were selected, 85 were used as data sets, and 15 were tested.

3.2 algorithm implementation

data structure

class optStruct:
    """
	Data structure to maintain all values that need to be operated
	Parameters: 
		dataMatIn - Data matrix
		classLabels - Data label
		C - Relaxation variable
		toler - Fault tolerance rate
		kTup - Tuple containing kernel information,The first parameter stores the kernel function category, and the second parameter stores the parameters required by the necessary kernel function
	"""

    def __init__(self, dataMatIn, classLabels, C, toler, kTup):
        self.X = dataMatIn  # Data matrix
        self.labelMat = classLabels  # Data label
        self.C = C  # Relaxation variable - > tolerance to outliers
        self.tol = toler  # Fault tolerance rate
        self.m = np.shape(dataMatIn)[0]  # Data matrix rows
        self.alphas = np.mat(np.zeros((self.m, 1)))  # Initialize the alpha parameter to 0 according to the number of rows of the matrix
        self.b = 0  # Initialization b parameter is 0
        self.eCache = np.mat(np.zeros((self.m, 2)))  # Initialize the error buffer according to the number of rows of the matrix. The first column is the valid flag bit, and the second column is the value of the actual error E.
        self.K = np.mat(np.zeros((self.m, self.m)))  # Initialize kernel K
        for i in range(self.m):  # Calculate the kernel K of all data
            self.K[:, i] = kernelTrans(self.X, self.X[i, :], kTup)

Calculate the kernel function, and convert the data into higher dimensional space through the kernel function

def kernelTrans(X, A, kTup):
    
    m, n = np.shape(X)
    K = np.mat(np.zeros((m, 1)))
    if kTup[0] == 'lin':
        K = X * A.T  # Linear kernel function, only inner product.
    elif kTup[0] == 'rbf':  # Gaussian kernel function, calculated according to the Gaussian kernel function formula
        for j in range(m):
            deltaRow = X[j, :] - A
            K[j] = deltaRow * deltaRow.T
        K = np.exp(K / (-1 * kTup[1] ** 2))  # Computing Gaussian kernel K
    else:
        raise NameError('Kernel unrecognized')
    return K  # Returns the calculated kernel K

smo algorithm

def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup=('lin', 0)):
    
    oS = optStruct(np.mat(dataMatIn), np.mat(classLabels).transpose(), C, toler, kTup)  # Initialize data structure
    iter = 0  # Initialize current iterations
    entireSet = True
    alphaPairsChanged = 0
    while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):  # If the alpha of the entire dataset is not updated or the maximum number of iterations is exceeded, exit the loop
        alphaPairsChanged = 0
        if entireSet:  # Traverse the entire dataset
            for i in range(oS.m):
                alphaPairsChanged += innerL(i, oS)  # Using optimized SMO algorithm
                print("Full sample traversal:The first%d Second iteration sample:%d, alpha Optimization times:%d" % (iter, i, alphaPairsChanged))
            iter += 1
        else:  # Traversal of non boundary values
            nonBoundIs = np.nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0]  # Traverse alpha not at boundaries 0 and C
            for i in nonBoundIs:
                alphaPairsChanged += innerL(i, oS)
                print("Non boundary traversal:The first%d Second iteration sample:%d, alpha Optimization times:%d" % (iter, i, alphaPairsChanged))
            iter += 1
        if entireSet:  # After traversing once, it is changed to non boundary traversal
            entireSet = False
        elif (alphaPairsChanged == 0):  # If the alpha is not updated, the full sample traversal is calculated
            entireSet = True
        print("Number of iterations: %d" % iter)
    return oS.b, oS.alphas  # Returns b and alpha calculated by SMO algorithm

Optimized smo algorithm

def innerL(i, oS):
    
    # Step 1: calculate the error Ei
    Ei = calcEk(oS, i)
    # Optimize alpha and set a certain fault tolerance rate.
    if ((oS.labelMat[i] * Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or (
            (oS.labelMat[i] * Ei > oS.tol) and (oS.alphas[i] > 0)):
        # Select alpha using inner loop heuristic method 2_ j. And calculate Ej
        j, Ej = selectJ(i, oS, Ei)
        # Save the aplpha value before update and use deep copy
        alphaIold = oS.alphas[i].copy()
        alphaJold = oS.alphas[j].copy()
        # Step 2: calculate the upper and lower bounds L and H
        if (oS.labelMat[i] != oS.labelMat[j]):
            L = max(0, oS.alphas[j] - oS.alphas[i])
            H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i])
        else:
            L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C)
            H = min(oS.C, oS.alphas[j] + oS.alphas[i])
        if L == H:
            print("L==H")
            return 0
        # Step 3: calculate eta
        eta = 2.0 * oS.K[i, j] - oS.K[i, i] - oS.K[j, j]
        if eta >= 0:
            print("eta>=0")
            return 0
        # Step 4: update alpha_j
        oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta
        # Step 5: trim alpha_j
        oS.alphas[j] = clipAlpha(oS.alphas[j], H, L)
        # Update Ej to error cache
        updateEk(oS, j)
        if (abs(oS.alphas[j] - alphaJold) < 0.00001):
            print("alpha_j The change is too small")
            return 0
        # Step 6: update alpha_i
        oS.alphas[i] += oS.labelMat[j] * oS.labelMat[i] * (alphaJold - oS.alphas[j])
        # Update Ei to error cache
        updateEk(oS, i)
        # Step 7: update b_1 and B_ two
        b1 = oS.b - Ei - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.K[i, i] - oS.labelMat[j] * (
                oS.alphas[j] - alphaJold) * oS.K[i, j]
        b2 = oS.b - Ej - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.K[i, j] - oS.labelMat[j] * (
                oS.alphas[j] - alphaJold) * oS.K[j, j]
        # Step 8: according to b_1 and b_2 update b
        if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]):
            oS.b = b1
        elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]):
            oS.b = b2
        else:
            oS.b = (b1 + b2) / 2.0
        return 1
    else:
        return 0

calculation error

def calcEk(oS, k):
   
    fXk = float(np.multiply(oS.alphas, oS.labelMat).T * oS.K[:, k] + oS.b)
    Ek = fXk - float(oS.labelMat[k])
    return Ek

Calculation of Ej by inner loop heuristic method

def selectJ(i, oS, Ei):
    
    maxK = -1
    maxDeltaE = 0
    Ej = 0  # initialization
    oS.eCache[i] = [1, Ei]  # Update error cache according to Ei
    validEcacheList = np.nonzero(oS.eCache[:, 0].A)[0]  # Returns the index value of data whose error is not 0
    if (len(validEcacheList)) > 1:  # There is an error that is not 0
        for k in validEcacheList:  # Traverse to find the maximum Ek
            if k == i: continue  # Not counting i is a waste of time
            Ek = calcEk(oS, k)  # Calculate Ek
            deltaE = abs(Ei - Ek)  # Calculate | EI EK|
            if (deltaE > maxDeltaE):  # maxDeltaE found
                maxK = k
                maxDeltaE = deltaE
                Ej = Ek
        return maxK, Ej  # Return maxK,Ej
    else:  # There is no error that is not 0
        j = selectJrand(i, oS.m)  # Randomly select alpha_ Index value of J
        Ej = calcEk(oS, j)  # Calculate Ej
    return j, Ej  # j,Ej

Trim alpha_j

def clipAlpha(aj, H, L):
    
    if aj > H:
        aj = H
    if L > aj:
        aj = L
    return aj

Update Ej to error cache

def updateEk(oS, k):
    """
	calculation Ek,And update the error buffer
	Parameters: 
		oS - data structure
		k - Label is k Index value of the data
	Returns:
		nothing
	"""
    Ek = calcEk(oS, k)  # Calculate Ek
    oS.eCache[k] = [1, Ek]  # Update error cache

3.3 operation results and analysis

The error rate is as follows:

The error rate of the test set is high, and there may be too many guessed attributes (30), which makes it difficult to classify
Even if the book data is used, with only two attributes, the error rate of the test set is twice that of the data set

Therefore, there are many possible changes in the code.

4, Summary

SVM is very difficult from principle to code implementation. It is still not well understood. We need to strengthen its learning in the future.

Keywords: Algorithm Machine Learning

Added by dodgyJim on Wed, 29 Dec 2021 04:56:28 +0200