CS231N Course Learning Summary (assignment 1)

1.image classification

Data is divided into train_data, val_data and test_data by data-driven algorithm. Different results are debugged with different hyperparameters on train, evaluated on verification set, and then applied to test with the best performance hyperparameters on verification set.
image classifier,data_driven approach,

Example 1: knn nearest neighbor algorithm

The code is divided into five parts: loading data (cifar-10), processing data, training model, testing data, and cross-validation.

Focus on knn's algorithmic representation:

(1) Calculating distance of two-layer cycle:

def compute_distances_two_loops(self, X):
    num_test = X.shape[0]
    num_train = self.x_train.shape[0]
    dists = np.zeros((num_test, num_train))
    for i in xrange(num_test):
      for j in xrange(num_train):
                dists[i,j]=np.sqrt(np.sum((X[i,:]-self.x_train[j,:])**2))#Computing Euclidean Distance
    return dists

(2) Calculating distance of one layer cycle:

def compute_distances_one_loop(self, X):
    num_test = X.shape[0]
    num_train = self.x_train.shape[0]
    dists = np.zeros((num_test, num_train))
    for i in xrange(num_test):
      dists[i]=np.sqrt(np.sum(np.square(X[i,:]-self.x_train),axis=1))#Calculate Euclidean Distance with a Layer of Cycle
    return dists

(3) Matrix computing distance:

def compute_distances_no_loops(self, X):
    num_test = X.shape[0]
    num_train = self.x_train.shape[0]
    dists = np.zeros((num_test, num_train)) 
    x_2=np.sum(np.square(X),axis=1)#Calculate x squares
    x_train_2=np.sum(np.square(self.x_train),axis=1)#Computing x_train square
    x_xtrain=np.dot(X,self.x_train.T)#Compute x.*x_train
    dists=np.sqrt(x_2.reshape(-1,1)-2*x_xtrain+x_train_2)#Calculating distance
    return dists

(4) Prediction part (voting mechanism)

def predict_labels(self, dists, k=1):
    num_test = dists.shape[0]
    y_pred = np.zeros(num_test)
    for i in xrange(num_test):
      order_dists=np.argsort(dists[i,:],axis=0)#Sort by column
      target_k=self.y_train[order_dists[: k]]#Record Category
    return y_pred

The python function in the voting mechanism:

1.np.argsort sorting, with axis=0 operation is sorted by column
2.np.argmax finds the index corresponding to the maximum value in the matrix
3.np.bincount count count count


num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]
x_train_folds = []
y_train_folds = []
y_train = y_train.reshape(-1, 1)
x_train_folds = np.array_split(x_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)
for k in k_choices:
   k_to_accuracies.setdefault(k, [])  # Setting up a dictionary
classifier = KNearestNeighbor()
for i in range(num_folds):
    x_train = np.vstack(x_train_folds[0:i] + x_train_folds[i + 1:])  # Set up the training set and remove the val where i is
    y_train = np.vstack(y_train_folds[0:i] + y_train_folds[i + 1:])  # label
    y_train = y_train[:, 0]
    classifier.train(x_train, y_train)
    for k in k_choices:
        x_pred = x_train_folds[i]  # x of prediction set
        y_pred = classifier.predict(x_pred, k=k)  # Predict y
        num_correct = np.sum(y_pred == y_train_folds[i][:,0])
        accuracy = float(num_correct) / len(y_pred)  # Accuracy of calculation
        k_to_accuracies[k] = k_to_accuracies[k] + [accuracy]  # Put precision into dictionary
# Print out the computed accuracies
for k in sorted(k_to_accuracies):
    for accuracy in k_to_accuracies[k]:
        print('k = %d, accuracy = %f' % (k, accuracy))
Dictionary operation

Here we construct a precision dictionary, k_to_accuracies, corresponding to k-accuracy. The setdefault function is used here. Compared with get, this function can add a query value when there is no value to query in the dictionary.

Example 2 SVM Support Vector Machine

Code is divided into: data loading, data processing, SVM-classifier, super-parameter adjustment.

Focus on the svm algorithm.

with loops

def svm_loss_naive(W, X, y, reg):
  dW = np.zeros(W.shape) # initialize the gradient as zero
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]#Scores of correct categories
    for j in xrange(num_classes):
      if j == y[i]:
      #Skip the correct category
      margin = scores[j] - correct_class_score + 1 # note delta = 1, score - correct category score + 1, reference formula
      if margin > 0:
        loss += margin
        dW[:,j] += X[i].T
        dW[:,y[i]] -= X[i].T
  loss /= num_train
  # Add regularization to the loss.
  loss += 0.5*reg * np.sum(W * W)
  dW +=reg*W
  return loss, dW

no loops

def svm_loss_vectorized(W, X, y, reg):
  loss = 0.0
  dW = np.zeros(W.shape) # initialize the gradient as zero
  e_number=np.sum(margins,axis=1)#Quantity of eligible contributions
  return loss, dW

Some practical python functions, such as (axis=1, line-by-line operation) (reshape(-1,1), turn a row matrix into a column)
(In the matrix, y.shape[0] returns the total number of rows in y, basically the same as len(y), and y.shape[1] returns the total number of columns in y.)

Super-parameter debugging

learning_rates = [1e-7, 5e-5]
regularization_strengths = [2.5e4, 5e4]
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None 
from cs231n.classifiers import LinearSVM
for lr,reg in zip(learning_rates,regularization_strengths):
    if best_val<val_accuracy:


Here, some combinations of lr and reg are output by using zip(learning_rates,regularization_strengths) to facilitate superparametric debugging.

Example 3 softmax loss function

Code is divided into data loading, data processing, software max-classfier, hyperparameter debugging

Here, the difference between soft Max and svm is that the loss function is different, and the rest of the training process and other basic ideas are the same.

with loops

import numpy as np
from random import shuffle
from past.builtins import xrange

def softmax_loss_naive(W, X, y, reg):
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)
  for i in range(num_train):
    loss -= np.log( np.exp(shift_scores[y[i]]) / np.sum(np.exp(shift_scores)) )
    for j in xrange(num_class):
     softmax_output = np.exp(shift_scores[j]) / np.sum(np.exp(shift_scores))
     if j == y[i]:
       dW[:,j] += (-1 + softmax_output) * X[i,:]#According to Back Propagation Algorithms
       dW[:,j] += softmax_output * X[i,:]

  loss /= num_train
  loss += 0.5 * reg * np.sum(W * W)
  dW /= num_train
  dW += reg * W
  return loss, dW

no loops

def softmax_loss_vectorized(W, X, y, reg):
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)
  shift_scores=scores-np.max(scores,axis=1).reshape(-1,1)#Avoiding the x-th power spillover of e can easily become nan
  loss = np.sum( -1 * np.log( softmax_out[range(num_train),y] ) )
  loss /= num_train
  loss += 0.5 * reg * np.sum(W * W)
  dS = softmax_out.copy()
  dS[range(num_train), list(y)] += -1
  dW = (X.T).dot(dS)
  dW = dW / num_train + reg * W  
  return loss, dW

Here, when calculating dw, we use the chain criterion of back propagation algorithm. At first, we don't understand it very well. If we look back, we will understand it. The shift processing of scores is also to avoid the x-Power of e being too large, so that the function value is nan.

Keywords: Python REST

Added by Shaudh on Sat, 17 Aug 2019 13:14:17 +0300