1.image classification
Data is divided into train_data, val_data and test_data by data-driven algorithm. Different results are debugged with different hyperparameters on train, evaluated on verification set, and then applied to test with the best performance hyperparameters on verification set.
image classifier,data_driven approach,
Example 1: knn nearest neighbor algorithm
The code is divided into five parts: loading data (cifar-10), processing data, training model, testing data, and cross-validation.
Focus on knn's algorithmic representation:
(1) Calculating distance of two-layer cycle:
def compute_distances_two_loops(self, X): num_test = X.shape[0] num_train = self.x_train.shape[0] dists = np.zeros((num_test, num_train)) for i in xrange(num_test): for j in xrange(num_train): dists[i,j]=np.sqrt(np.sum((X[i,:]-self.x_train[j,:])**2))#Computing Euclidean Distance return dists
(2) Calculating distance of one layer cycle:
def compute_distances_one_loop(self, X): num_test = X.shape[0] num_train = self.x_train.shape[0] dists = np.zeros((num_test, num_train)) for i in xrange(num_test): dists[i]=np.sqrt(np.sum(np.square(X[i,:]-self.x_train),axis=1))#Calculate Euclidean Distance with a Layer of Cycle return dists prints(dists)
(3) Matrix computing distance:
def compute_distances_no_loops(self, X): num_test = X.shape[0] num_train = self.x_train.shape[0] dists = np.zeros((num_test, num_train)) x_2=np.sum(np.square(X),axis=1)#Calculate x squares x_train_2=np.sum(np.square(self.x_train),axis=1)#Computing x_train square x_xtrain=np.dot(X,self.x_train.T)#Compute x.*x_train dists=np.sqrt(x_2.reshape(-1,1)-2*x_xtrain+x_train_2)#Calculating distance return dists
(4) Prediction part (voting mechanism)
def predict_labels(self, dists, k=1): num_test = dists.shape[0] y_pred = np.zeros(num_test) for i in xrange(num_test): order_dists=np.argsort(dists[i,:],axis=0)#Sort by column target_k=self.y_train[order_dists[: k]]#Record Category y_pred[i]=np.argmax(np.bincount(target_k)) return y_pred
The python function in the voting mechanism:
1.np.argsort sorting, with axis=0 operation is sorted by column
2.np.argmax finds the index corresponding to the maximum value in the matrix
3.np.bincount count count count
(5)cross-validation
num_folds = 5 k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100] x_train_folds = [] y_train_folds = [] y_train = y_train.reshape(-1, 1) x_train_folds = np.array_split(x_train, num_folds) y_train_folds = np.array_split(y_train, num_folds) k_to_accuracies={} for k in k_choices: k_to_accuracies.setdefault(k, []) # Setting up a dictionary pass classifier = KNearestNeighbor() for i in range(num_folds): x_train = np.vstack(x_train_folds[0:i] + x_train_folds[i + 1:]) # Set up the training set and remove the val where i is y_train = np.vstack(y_train_folds[0:i] + y_train_folds[i + 1:]) # label y_train = y_train[:, 0] classifier.train(x_train, y_train) for k in k_choices: x_pred = x_train_folds[i] # x of prediction set y_pred = classifier.predict(x_pred, k=k) # Predict y num_correct = np.sum(y_pred == y_train_folds[i][:,0]) accuracy = float(num_correct) / len(y_pred) # Accuracy of calculation k_to_accuracies[k] = k_to_accuracies[k] + [accuracy] # Put precision into dictionary # Print out the computed accuracies for k in sorted(k_to_accuracies): for accuracy in k_to_accuracies[k]: print('k = %d, accuracy = %f' % (k, accuracy))
Dictionary operation
Here we construct a precision dictionary, k_to_accuracies, corresponding to k-accuracy. The setdefault function is used here. Compared with get, this function can add a query value when there is no value to query in the dictionary.
Example 2 SVM Support Vector Machine
Code is divided into: data loading, data processing, SVM-classifier, super-parameter adjustment.
Focus on the svm algorithm.
with loops
def svm_loss_naive(W, X, y, reg): dW = np.zeros(W.shape) # initialize the gradient as zero num_classes = W.shape[1] num_train = X.shape[0] loss = 0.0 for i in xrange(num_train): scores = X[i].dot(W) correct_class_score = scores[y[i]]#Scores of correct categories for j in xrange(num_classes): if j == y[i]: #Skip the correct category continue margin = scores[j] - correct_class_score + 1 # note delta = 1, score - correct category score + 1, reference formula if margin > 0: loss += margin dW[:,j] += X[i].T dW[:,y[i]] -= X[i].T loss /= num_train dW/=num_train # Add regularization to the loss. loss += 0.5*reg * np.sum(W * W) dW +=reg*W return loss, dW
no loops
def svm_loss_vectorized(W, X, y, reg): loss = 0.0 dW = np.zeros(W.shape) # initialize the gradient as zero num_classes=W.shape[1] num_train=X.shape[0] scores=X.dot(W) correct_class_scores=scores[np.arange(num_train),list(y)].reshape(-1,1) margins=scores-correct_class_scores+1 margins[margins<0]=0 margins[np.arange(num_train),y]=0 loss=np.sum(margins)/num_train loss+=0.5*reg*np.sum(W*W) margins[margins>0]=1 e_number=np.sum(margins,axis=1)#Quantity of eligible contributions margins[np.arange(num_train),y]-=e_number dW=np.dot(X.T,margins)/num_train dW+=reg*W return loss, dW
Some practical python functions, such as (axis=1, line-by-line operation) (reshape(-1,1), turn a row matrix into a column)
(In the matrix, y.shape[0] returns the total number of rows in y, basically the same as len(y), and y.shape[1] returns the total number of columns in y.)
Super-parameter debugging
learning_rates = [1e-7, 5e-5] regularization_strengths = [2.5e4, 5e4] results = {} best_val = -1 # The highest validation accuracy that we have seen so far. best_svm = None from cs231n.classifiers import LinearSVM for lr,reg in zip(learning_rates,regularization_strengths): svm=LinearSVM() svm.train(X_train,y_train,learning_rate=lr,reg=reg,num_iters=1000,verbose=False) y_train_pred=svm.predict(X_train) train_accuracy=np.mean(y_train==y_train_pred) y_val_pred=svm.predict(X_val) val_accuracy=np.mean(y_val==y_val_pred) results[(lr,reg)]=(train_accuracy,val_accuracy) if best_val<val_accuracy: best_val=val_accuracy best_svm=svm pass
Here, some combinations of lr and reg are output by using zip(learning_rates,regularization_strengths) to facilitate superparametric debugging.
Example 3 softmax loss function
Code is divided into data loading, data processing, software max-classfier, hyperparameter debugging
Here, the difference between soft Max and svm is that the loss function is different, and the rest of the training process and other basic ideas are the same.
with loops
import numpy as np from random import shuffle from past.builtins import xrange def softmax_loss_naive(W, X, y, reg): # Initialize the loss and gradient to zero. loss = 0.0 dW = np.zeros_like(W) num_class=W.shape[1] num_train=X.shape[0] for i in range(num_train): scores=X[i].dot(W) shift_scores=scores-max(scores) loss -= np.log( np.exp(shift_scores[y[i]]) / np.sum(np.exp(shift_scores)) ) for j in xrange(num_class): softmax_output = np.exp(shift_scores[j]) / np.sum(np.exp(shift_scores)) if j == y[i]: dW[:,j] += (-1 + softmax_output) * X[i,:]#According to Back Propagation Algorithms else: dW[:,j] += softmax_output * X[i,:] loss /= num_train loss += 0.5 * reg * np.sum(W * W) dW /= num_train dW += reg * W pass return loss, dW
no loops
def softmax_loss_vectorized(W, X, y, reg): # Initialize the loss and gradient to zero. loss = 0.0 dW = np.zeros_like(W) num_class=W.shape[1] num_train=X.shape[0] scores=X.dot(W) shift_scores=scores-np.max(scores,axis=1).reshape(-1,1)#Avoiding the x-th power spillover of e can easily become nan softmax_out=np.exp(shift_scores)/np.sum(np.exp(shift_scores),axis=1).reshape((-1,1)) loss = np.sum( -1 * np.log( softmax_out[range(num_train),y] ) ) loss /= num_train loss += 0.5 * reg * np.sum(W * W) dS = softmax_out.copy() dS[range(num_train), list(y)] += -1 dW = (X.T).dot(dS) dW = dW / num_train + reg * W return loss, dW
Here, when calculating dw, we use the chain criterion of back propagation algorithm. At first, we don't understand it very well. If we look back, we will understand it. The shift processing of scores is also to avoid the x-Power of e being too large, so that the function value is nan.